IMAGE  UNDERSTANDING  WORKSHOP  — OGTOBER  1977 


•v  - ' ; 


IMAGE  UNDERSTANDING  WORKSHOP 


OCTOBER  1977 


Sponsored  by: 

Information  Processing  Techniques  Office 
Defense  Advanced  Research  Projects  Agency 


KNOWLEDGE  REPRESENTATION 


GENERIC 

CONCEPTUAL  MODE'  S 


IMAGES 


VOCABULARY 


ARCH 


AIRFIELD 


AIRCRAFT 


IMAGE  PERCEPTION 


NATURAL  LANGUAGE 


A WORD  IS  WORTH  A TH*^USAND  PICTURES 


dItkibu  r iCjN . ?T  J 

Appioved  iL'i  public  r6l©3.8(ftr 
Dietiibution  Unlicutftd 


^AGE  pTOERSTANDING 

i Proceedings  of  a Workshop 


Jield  at 
Palo  Alto,  Caufomia 
October  21, 1977 


Sponsored  by  the 
Defense  Advanced  Research  Projects  Agency 


Report  N’ 


-37-54  \ 


Proceedings  Editor 


This  report  was  supported  by 
the  Defense  Advanced  Research 
Projects  Agency  under  DARPA 
Order  No.  3456  monitored  by  the 
Defense  Supply  Service,  Washington,  D.C. 


The  views  and  conclusions  contained  in  this  doctunent  are  those  of  the  authors  and  should  not  be  interpreted  as 
necessarily  representing  the  ofHcial  policies,  either  expressed  or  implied  of  the  Defense  Advanced  Research 
Projects  Agency  or  the  United  States  Government. 


proved  for  pubLc  release^ 
Distribution  Unlimitsd 


APPROVED  port  PUBLIC  REUEASE 
DISIRIBUTIOH  UKiilMIXED 


TABLE  OF  CONTENTS 


FORWARD 


PAGE 

i 


AUTHOR  INDEX ill 

SESSION  I - HARDWARE  (Moderator:  Dr.  William  A.  Sander;  Army  Research  Office) 
"Image-Processing  Techniques  Using  Charge-Transfer  Devices" 

G.R.  Nudd,  P.A.  Nygaard,  J.L.  Erickson;  Hughes  Research  Laboratories  1 

"A  CCD  Histogram-Sorter:  Feasibility  Chip" 

T.  Schutt,  G.  Borsuk,  T.J.  Willett;  Westinghouse  Defense  & Space  Center 7 

"CCD  Implementation  of  an  Image  Segmentation  Algorithm" 

T.J.  Willett;  Westinghouse  Defense  & Space  Center 9 

"An  Image  Processor  Architecture" 

P.  Juetten,  G.  Allen;  Control  Data  Corporation 

B.  Hon,  D.R.  Reddy;  Carnegie-Mellon  University  12 

SESSION  II  - SYSTEMS  (Moderator:  Mr.  John  S.  Denhe;  Army  Night  Vision  Laboratory) 
"Understanding  Natural  Texture" 

J.T.  Maleson,  C.M.  Brown,  J.A.  Feldman;  University  of  Rochester 19 

"Relaxation  Methods:  Recent  Developments" 

A.  Rosenfeld;  University  of  Maryland  28 

"A  Stereo  Vision  System" 

D.B.  Gennery;  Stanford  University 31 

"The  MIDAS  Sensor  Database  and  its  Use  in  Performance  Evaluation" 

D.M.  McKeown,  D.R.  Reddy;  Carnegie-Mellon  University  47 

SESSION  III  - RECOGNITION  (Moderator:  Mr.  Henry  R.  Cook;  Defense  Mapping  Agency) 
"Locating  Structures  in  Aerial  Images" 

R.  Nevatia,  K.  Price;  University  of  Southern  California 52 

"3-Dimensional  Aircraft  Recognition  Using  Fourier  Descriptors" 

T.  Wallace,  P.A.  Wintz;  Purdue  University 55 

"Image  Mensuration  by  Maximum  A Posteriori  Probability  Estimation" 

J.V7.  Burnett,  T.S.  Huang;  Purdue  University 64 

"Adaptive  Threshold  for  an  Image  Recognition  System" 

D.  Serreyn,  R.  Larson;  Honeywell  Systems  & Research  Center  73 


SESSION  IV  - REGISTRATION  & SEGMENTATION  (Moderator:  Dr.  Gordon  Goldstein, 

Office  of  Naval  Research) 

"Using  Synthetic  Images  to  Register  Real  Images  with  Surface  Models" 


B.K.P.  Horn,  B.L.  Bachman;  Massachusetts  Institute  of  Technology  75 

"Analytic  Results  of  the  Coleman  Segmentor* 

H.C.  Andrews;  University  of  Southern  California 

"Progress  Report  of  Segmentation  Using  Convergent  Evidence" 

D.L.  Milgram;  University  of  Maryland 104 


TABLE  OF  CONTENTS  (Cent.) 


PAGE 


SESSION  V - PROGRAM  REVIEWS  BY  PRINCIPAL  INVESTIGATORS 

(Moderator:  LTC  David  L.  Carlstrom;  Defense  Advanced  Research  Projects  Agency) 

"Image  Understanding  at  OSC:  April  - September  1977" 

H.C.  Andrews;  University  of  Southern  California 109 

"Interactive  Aids  for  Cartography  and  Photo  Interpretation:  Progress  Report, 

) October  1977" 

H.G.  Barrow,  et  al.;  SRI  International Ill 

"Spatial  Understanding  Overview" 

T.O.  Binford;  Stanford  University 128 

"Overview  of  the  Rochester  Image  Understanding  Project" 

J.A.  Feldman;  University  of  Rochester 129 

"Image  Understanding  and  Information  Extraction" 

T.S.  Huang,  K.S.  Fu;  Purdue  University 132 

"Automatic  Image  Recognition  System,  Program  Status,  September  1977" 

R.  Larson;  Honeywell  Systems  & Research  Center  134 

"Image  Understanding  Research  at  CMU:  A Progress  Report" 

D.R.  Reddy;  Carnegie-Mellon  University  136 

"Algorithms  and  Hardware  Technology  for  Image  Recognition,  Project  Status 
Report  - September,  1977" 

D.L.  Milgr^un,  A.  Rosenfeld;  University  of  Maryland 139 

"MIT  Progress  in  Understanding  Images" 

P.H.  Winston;  Massachusetts  Institute  of  Technology 141 


FORWARD 


"a  word  is  worth  a thousand  pictures"* 


In  October  1977,  the  Image  Understanding  Program  completed  its  second  year  of 
effort  as  part  of  the  planned  five  year  research  effort  to  develop  the  technology  re- 
quired for  automatic  and  semiautomatic  interpretation  and  analysis  of  military  photographs 
and  related  images.  The  Information  Processing  Techniques  Office  (IPTO)  manages  this 
major  Defense  Advanced  Research  Projects  Agency  (DARPA)  research  program.  Semi-annual 
workshops  have  been  held  to  provide  cross-fertilization  of  research  findings  zunong  the 
various  and  diverse  activities  working  on  the  program  and  to  keep  the  operational  user 
community  in  close  contact  with  the  research  community.  Through  these  workshops,  the 
end  users  can  provide  guidance  on  the  requirements  and  the  researchers  can  keep  the  users 
apprised  of  progress  and  problems  encountered  as  the  work  moves  forward. 


Lieutenant  Colonel  David  L.  Carlstrom  USAF,  has  been  the  director  of  the 
program  since  its  inception  in  late  1975.  At  this  point  in  the  evolution  of  the  project, , 
LTC  Carlstrom  has  observed,  it  is  time  to  consider  the  various  strategies  for  the  demo-  \ 
stration  of  results  in  a testbed  environment.  It  was  this  tenet  which  the  program  ' 

manager  made  as  the  keynote  point  for  the  Fall  1977  workshop.  < 


This  document  contains  papers  submitted  by  various  research  personnel  engaged 
in  the  major  parts  of  the  overall  program  and  includes  brief  outlines  of  the  progress ' 
reports  as  detailed  by  the  principal  investigators,  

The  university/industrial  teams  and  research  agencies  currently  working  on  the 
DARPA  program  are : 

• University  of  Southern  California  - Hughes  Research  Laboratories 

• University  of  Maryland  - Westinghouse , Incorporated 

• Purdue  University  - Honeywell,  Incorporated 

• Carnegie-Mellon  University  - Control  Data  Corporation 

• Massachusetts  Institute  of  Technology 

• Stanford  University 

• University  of  Rochester 

• SRI  International 

• Honeywell,  Incorporated 


The  fall  197'’  workshop,  the  sixth  in  the  series,  v;as  hosted  by  Dr.  Thomas  0. 
Binford,  Research  Asso  late  at  the  Artificial  Intelligence  Laboratory  at  Stanford 
University.  The  meetings  were  held  at  the  Holiday  Inn,  Palo  Alto,  California  on  20-21 
October  1977.  In  attendance  were  over  60  members  of  the  research  staffs  of  the  organ- 
izations named  above,  and  military  and  civilian  members  of  research  laboratories  and 
staff  agencies  interested  in  the  results  of  the  program.  This  open  "dialogue”  served 
the  principal  purpose  of  enhancing  the  potential  for  technology  transfer  so  necessary 
to  Insure  user  acceptance  of  new  and  vital  research . 


* Quote  by  John  Kenneth  Galbraith  in  his  television  series  "The  Age  of  Uncertainty". 


The  cover  design  was  created  by  David  E.  Badura  and  Thomas  G.  Dickerson  of 
Science  Applications,  Inc.,  in  an  attempt  to  help  articulate  LTC  Carlstrom's  premise 
that  multiple  technologies  must  interact  in  a hierarchical  manner  to  convert  basic 
image  data  into  useful  knowledge  sources  for  use  by  decisionmakers. 

The  Conference  Organizer  wishes  to  thank  Dr.  Willieun  Sander  of  the  Army 
Research  Office,  Mr.  John  Denhe  of  the  Army  Night  Vision  Laboratory,  Mr.  Henry  Cook 
of  the  Defense  Mapping  Agency,  and  Dr.  Gordon  Goldstein  of  the  Office  of  Naval  Research 
for  moderating  the  technical  sessions.  Also,  Ms.  Patte  Woods  of  Stanford  University 
rendered  valuable  assistance  in  making  the  arrangements  for  the  workshop  and  in  han- 
dling of  the  requirements  of  the  participants.  Typing  support,  including  mailings  and 
the  collection  and  arrangement  of  the  conference  proceedings,  was  contributed  by  Mrs. 
Kristin  G.  Johncox  of  Science  Applications,  Inc. 


Lee  S . Baumann 

Science  Applications,  Inc. 

Workshop  Organizer 
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IMAGE-PROCESSING  TECHNIQUES  USING  CHARGE-TRANSFER  DEVICES 


G.R.  Nudd,  P.A.  Nygaard,  and  J.L.  Erickson 


Hughes  Research  Laboratories,  Malibu,  California 


ABSTRACT 

This  paper  describes  two  charge-coupled 
device  (CCD)  circuits  currently  being  developed 
for  Image  processing.  A principal  aim  of  the 

program  Is  to  develop  the  technology  necessary 

for  the  ultimate  integration  of  both  the  sensing 
and  processing  electronics  on  a single  substrate. 
The  circuits  operate  on  a three-by-three  array 
of  picture  elements  and  perform  algorithms  such 
as  edge  detection,  binarization,  and  unsharp 
masking.  The  very  low  power-delay  products 
inherent  In  this  technology  (approximately 
10"’  pJ)  allows  hiohly  parallel  processing  con- 
figurations to  be  used  on  a single  substrate, 
thereby  Increasing  the  processing  speed  by  sev- 
eral orders  of  magnitude.  Further,  by  perform- 
ing all  the  necessary  arithmetic  operations  in 
the  charge  domain,  avoiding  floating  gate  ampli- 
fiers, etc.,  significant  advantages  can  be 
gained  in  terms  of  power,  dynamic  range, 
linearity,  etc. 


I.  INTRODUCTION 

Until  recently,  the  speed  and  complexity  of 
most  image-processing  algorithms  prevented 
integrated-circuit  (IC)  techniques  from  being  used 
to  process  the  data;  therefore,  almost  all  process- 
ing has  been  performed  on  general-purpose  digital 
computers.  Since  the  cycle  time  of  a typical  ma- 
chine Is  a few  microseconds,  even  the  simplest 
algorithm  requires  many  seconds  to  execute.  But 
n-channel  MOS  and  CCD  technologies  allow  highly 
complex  circuit  functions  to  be  built  Into  single 
ICs  that  can  be  clocked  at  rates  1n  excess  of 
10  MHz.  CCD  technology  Is  of  particular  Interest 
since  it  provides  the  opportunity  to  combine  the 
sensor  and  the  Information  processing  on  to  one 
substrate.  During  the  past  several  years,  several 
CCD  cameras  have  been  developed  that  operate  at  TV 
rates  and  provide  a charge  output  which  Is  directly 
proportional  to  the  Image  pixel  Intensity.  By  com- 
bining such  detectors  with  the  CCD  circuits  to  be 
discussed  here,  full  frames  of  data  can  be  proc- 
essed in  parallel  as  illustrated  In  Figure  1.  Such 
a combination  might  be  able  to  process  a full  frame 
In  about  SO  usee. 

Two  test  circuits  are  discussed.  Test  Cir- 
cuit I performs  the  3-by-3  Sobel  algorithm.  Test 
Circuit  II  performs  edge  detection,  low-pass 
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Figure  1.  Concept  of  monolithic 
image  preprocessor. 


filtering  (or  local  averaging),  unsharp  masking, 
binarization,  and  adaptive  stretching  on  a 3-by-3 
array.  Details  of  each  circuit  are  given  in 
Ref.  1.  Circuit  I has  been  fabricated,  and  pre- 
liminary test  results  are  included,  together  with 
photographs  of  the  processed  Images  for  standard 
test  patterns.  Test  Circuit  II  has  been  designed 
and  processed,  and  a preliminary  evaluation  begun. 
In  addition,  we  have  built  a full  set  of  program- 
mable drivers  for  the  CCD  circuits  and  a computer- 
controlled  test  facility  that  can  test  a circuit 
using  any  image  stored  on  magnetic  tape.  The  sys- 
tem can  connunlcate  directly  with  comnerclal 
general-purpose  machines  (In  this  case  the  USC 
PDP-10)  to  obtain  the  required  data  base,  format 
the  data  appropriately  for  testing,  and  digitize 
the  processed  data  from  the  CCD  output  either  for 
display  In  the  laboratory  or  for  transmission  back 
to  the  source  computer. 

II.  TEST  CIRCUIT  I 

Test  Circuit  I,  a photograph  of  which  Is 
shown  In  Figure  2,  Is  a two-phase  n-channel  device 
with  7.5  pm  gate  lengths.  It  consists  of  a two- 
dimensional  CCD  array  capable  of  accepting  three 
adjacent  lines  of  data  with  a floating  gate  elec- 
trode structure  to  provide  the  weighting  and  arith- 
metic operation  necessary  to  calculate  the  two 
orthogonal  edge  components.  The  outputs  from  this 
array  are  connected  to  two  parallel  CCD  circuits 
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Figure  3.  Block  schematic  of  Sobel  circuit, 
where 


Figure  2.  Photograph  of  Test  Circuit  I. 

(as  shown  in  Figure  3);  these  calculate  the  abso- 
lute values  and  perform  the  summation  necessary  for 
the  full  Sobel  calculation: 

S(e)  = 1/8  {|(a  + 2b  + c)  - (g  + 2h  + i)| 

+ |(a  + 2d  + g)  - (c  + 2f  + i)|)  (1) 


Sj,{t)  = 1/8  [1  2 1] 


V,(t) 

V,(t-T) 

Vi(t-2T) 


+ 1/8  [0  0 0] 


V2(t) 

V2(t-T) 

V2(t-2T) 


The  processing  of  this  test  circuit  is  now  com- 
plete, and  the  detailed  testing  of  the  structure 
has  begun.  The  test  facilities  developed  for  this 
program  are  based  on  an  IMSAI  8080  micro-computer; 
this  computer  accepts  digital  data  from  the  Univer- 
sity of  Southern  California  Image  Processing  Insti- 
tute (use  UPI)  data  base  and  stores  it  in  a RAM 
memory.  This  is  then  converted  to  analog  sampled 
data  equivalent  to  the  picture  intensities  a 
through  i and  fed  into  the  CCD  array.  The  CCD 
drivers  (which  provide  the  two-phase  clocks  and  the 
reset  and  diffusion  pulses)  request  data  from  the 
computer  at  the  appropriate  times  to  simulate  three 
adjacent  lines  of  image  data.  This  data  is  applied 
to  the  input  gates  of  the  modified  Tompsett  inputs, 
and  the  resulting  charge  is  clocked  through  the 
array. 

The  performance  of  the  Sobel  operator  can  best 
be  analyzed  by  viewing  the  circuit  as  a two- 
dimensional  combination  of  three- transversal  fil- 
ters. The  full  Sobel  output  as  a function  of 
time,  S(t),  can  then  be  viewed  as  a combination  of 
the  two  orthogonal  edge  components  Sx(t)  and  Sy(t) 
such  that 

S(t)  = |S,(t)|  + |Sy(t)|  , (2) 


+ 1/8  [-1  -2  -1] 
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Sy(t)  = 1/8  [1  0 -1] 


V,(t) 

V^(t-T) 

V,(t-2T) 


+ 1/8  [2  0 -2] 


V2(t) 

V2(t-T) 

V2(t-2T) 
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(3) 


(4) 


Here  Vi(t),  V2(t),  and  V3(t)  are  the  inputs  to  the 
three  channels,  and  T is  the  clock  period.  The 
impulse  response  of  the  three  channels  then  cor- 
responds directly  to  the  appropriate  row  vectors 
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Figure  6.  Experimental  evaluation  of  the 
CCD  absolute  value  circuit. 
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Figure  7.  Example  of  CCD  Sobel  operation  on  test 
imagery  with  vertical  symmetry.  Resolution  is 
128  X 128  pixel  of  4 bits  each. 

The  speed  of  this  process  is  presently  limited 
by  the  micro-computer  to  about  4 kHz,  requiring 
approximately  4 sec  to  process  a full  frame.  The 
CCD  circuits  themselves  run  several  orders  of  mag- 
nitude faster  than  this,  and  we  are  currently  in- 
vestigating techniques  to  decrease  the  time 
required  to  access  the  data  and  perform  the  neces- 
sary conversions  between  analog  and  digital  data. 
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Figure  8.  CCD  operation  on  test  imagery  with 
horizontal  symmetry. 

Nevertheless,  this  demonstrates  the  CCD  concepts 
for  edge  detection. 

III.  TEST  CIRCUIT  II 

Test  Circuit  II  performs  the  five  algorithms 
given  in  Eqs.  1,  6,  7,  8,  and  9. 

Local  averaging;  f^  = i/g  (a  + b + c 

+ d + e+  f + g 

+ h + i)  (6) 

Unsharp  masking;  f^^^^  = (1  - a)e  + af^  (7) 


Binarization; 


Adaptive  stretching; 

f . (2  min|e,  r/2|  for  < r/2 

I 2 max|e,  r/2, 01  for  > r/2  . (9) 

The  purpose  of  the  circuit  is  to  investigate  the 
possibility  of  performing  adaptive  processing 
using  the  local  mean  as  the  control  function.  The 
circuit  is  designed  in  modular  form  — each 
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1/8  [1  2 1] 

1/8  [2  0 -2] 
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(5) 


♦2 


The  operation  of  the  transverse  filter  array  can  be 
determined  by  examining  the  impulse  responses.  The 
experimentally  derived  impulse  functions  for  each 
of  these  are  shown  in  Figure  4{a,  b,  and  c).  The 
outputs  correspond  directly  to  the  vectors  in  Eq.  5 
and  hence  the  weightings  are  being  performed  cor- 
rectly. In  these  preliminary  tests,  the  relative 
values  of  the  weights  are  not  as  accurate  as 
required.  Since  cMs  could  be  due  to  incomplete 
charge  transfer,  we  are  investigating  techniques  to 
improve  this  by  small  shifts  in  the  relative  phase 
of  the  two  clocks. 

To  complete  the  Sobel  algorithm,  the  absolute 
value  of  each  component  must  be  evaluated  and  then 
summed  separately  as  described  in  Eq.  2.  The  cir- 
cuit that  performs  this  operation  is  shown  in  Fig- 
ure 5.  It  consists  of  a modified  Tompsett  input 
device  with  a reference  gate,  Bi,  and  a signal 
gate,  SIG.  When  a positive  (with  respect  to  B-| ) 
input  signal  is  applied  to  the  signal  gate,  a 
potential  well  proportional  to  the  signal  is 
created  under  gates  82  and  F^.  When  the  diffusion 
is  pulsed,  charge  flows  over  the  signal  gate  and 
remains  trapped  until  4!0ut  A A is  clocked. 
Alternatively,  if  the  signal  is  negative  with  re- 
spect to  Bi , charge  is  trapped  under  the  signal 
gate  and  F^  after  the  diffusion  is  pulsed.  If  the 
area  of  all  the  gates  are  equal,  the  charge  trans- 
ferred along  the  channel  will  depend  only  on  the 
magnitude  of  the  input  signal  and  be  independent  of 
the  polarity.  Hence,  the  absolute  magnitude  oper- 
ation will  have  been  performed.  An  experimental 
demonstration  of  this  circuit  is  shown  in  Figure  6. 
Here  the  input  signal  is  an  impulse  of  one  pixel 
duration,  equivalent  to  an  image  consisting  of  a 
dark  vertical  bar  on  a light  field.  The  output 
signal  from  the  CCD  filter,  which  corresponds  to 
Sy(t),  is  shown  in  the  top  trace.  It  consists  of 
two  output  signals  corresponding  to  the  leading  and 
trailing  edges  of  the  bar.  Note  the  polarity 
change  at  the  two  edges  of  the  bar  corresponding  to 
a change  of  intensity  from  high  to  low  and  vice- 
versa.  When  this  signal  is  applied  to  the  absolute 
value  circuit,  the  output  is  as  shown  in  the  lower 
trace.  The  two  signals  are  now  converted  to  the 
same  polarity,  corresponding  to  the  start  and  end 
of  the  bar  (as  required  by  the  true  Sobel  algo- 
rithm). We  conclude  from  these  waveforms  that  edge 
detection,  according  to  Eq.  1,  is  being  performed. 

In  addition  to  the  detailed  electrical  testing 
of  these  circuits,  we  have  made  considerable  advan- 
ces in  the  computer-controlled  test  facility  re- 
quired to  evaluate  the  circuit  using  the  USC  data 
base.  This  system  is  briefly  described  in  Section 
IV.  We  have  modified  a cottmercial  display  to  pro- 
vide a resolution  of  128  by  128  pixels  with  four- 
bit  intensity,  and  generated  several  test  patterns. 
Examples  of  test  patterns  that  demonstrate  the 
operation  of  the  circuit  is  given  in  Figures  7 and 
8.  The  processed  output  for  these  images  are  shown 
in  Figure  7(b)  and  8(b),  where  the  detected  edges 
of  each  line  determined  by  Eq.  1 are  displayed. 


(c)  BOTTOM  CHANNEL 


Figure  4.  Impulse  response  for  three  channels 
of  the  array. 
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Figure  5.  Schematic  of  CCD  absolute 
value  circuit. 
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algorithm  is  performed  in  a single  integrated 
circuit  on  the  one  chip.  Coaxial  interconnects  or 
wire  bonds  at  the  chip  surface  will  be  used  to 
achieve  the  full  processing  capability.  This  cir- 
cuit has  been  designed  and  processed  using  design 
rules  similar  to  those  used  for  Test  Circuit  I.  A 
photograph  of  the  full  circuit  is  shown  in  Fig- 
ure 9.  Performance  evaluation  and  testing  of  this 
circuit  is  expected  to  be  completed  during  the 
next  two  months.  Details  of  its  operation  will 
be  included  in  the  report  for  April  1978. 


IV.  IMAGE  PROCESSING  TEST  FACILITIES 

In  addition  to  developing  the  concept,  design- 
ing the  circuit  and  laying  out  and  fabricating  the 
two  test  chips,  it  was  necessary  to  build  the 
clocks  and  drivers  for  testing  and  to  provide  the 
correct  data  interface.  A significant  amount  of 
work  has  been  undertaken  in  this  area,  and  sub- 
stantial progress  has  been  made.  The  computer- 
controlled  test  facility,  shown  in  Figure  10,  that 
can  receive  and  transmit  digital  data  through  an 
asynchronous  interface  to  any  general-purpose  com- 
puter having  a time-share  capability,  has  been 
built  to  perform  the  circuit  testing.  It  enables 
us  to  access  a very  large  data  base  and  retransmit 


Figure  9.  Photograph  of  Test  Circuit  II. 


Figure  10.  Schematic  of  test  set-up. 
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processed  data.  The  original  Images  are  stored  In 
the  RAM  of  our  test  facility  and  converted  to 
analog  format  for  processing.  Three  lines  of  data 
representing  three  adjacent  lines  of  the  Image  are 
then  fed  as  Inputs  to  the  CCD  circuits.  The  cor- 
rect timing  between  the  clocks,  etc.,  and  the  valid 
data  Is  maintained  by  a master  clock  that  slaves 
both  the  computer  snd  driver  box.  A single  line  of 
processed  data  Is  then  taken  from  the  CCD  circuits 
through  an  analog-to-digital  converter  and  fed  to 
the  RAM,  where  the  processed  picture  Is  stored. 

From  this  location,  the  data  can  either  be  dis- 
played locally  or  sent  back  to  a main-frame  com- 
puter. The  speed  of  this  operation  Is  currently 
limited  by  the  cycle  time  of  the  micro-processor  to 
about  5 kHz,  but  we  are  working  on  Improving  this. 

This  system  Is  as  flexible  as  possible  to  allow 
the  phase  of  the  clocks,  the  diffusion  pulses,  and 
the  resets  to  be  programned  externally.  In  addi- 
tion, all  the  necessary  software  required  to  gen- 
erate test  patterns  to  perform  the  calibration  and 
to  provide  the  correct  sequence  of  data  equivalent 
to  a continuous  three  line  scan  has  been  completed 
and  the  system  Is  operational.  More  complete 


Information  regarding  this  Is  given  In  the  DSC  IPI 
Semiannual  Report  dated  September  1977. 

V.  CONCLUSIONS 

Significant  progress  has  been  made  In  three 
areas:  the  testing  and  performance  evaluation  of 
the  Sobel  operator,  the  detailed  design  and  fabri- 
cation of  Test  Chip  II,  and  the  design  and  building 
of  the  necessary  test  facilities. 

Figures  4 through  8 show  the  performance  of  the 
CCD  Sobel  operator,  thus  validating  the  original 
concepts  and  the  design.  Further  experimentation 
Is  currently  continuing  to  Increase  the  speed  and 
accuracy  and  test  the  operation  using  a more 
extensive  data  base.  The  results  of  this  will  be 
reported  in  a future  paper.  Test  Circuit  II  has 
now  been  designed  and  processed  and  we  have 
started  a full  evaluation  of  this.  Finally,  we  have 
Installed  test  facilities  to  provide  the  necessary 
flexibility  to  drive  the  wide  range  of  circuit 
elements  Involved  and  to  provide  the  necessary 
data  link  with  other  stand  alone  machines  In  the 
conimjnity. 
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FEASIBILITY  CHIP 
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ABSTRACT 

Under  contract  to  University  of  Maryland, 
Westlnghouse  has  been  Implementing  algorithms  for 
use  In  the  target  cueing  process  on  the  focal  plane 
of  Imaging  sensors.  The  program  Is  sponsored  by 
DARFA,  and  monitored  by  the  Army's  Might  Vision 
Laboratory.  It  has  resulted  In  an  examination  of 
the  latest  advances  In  CCD  technology  and  led  to 
the  design  of  Innovative  structures  which  require 
very  small  chip  area.  This  paper  Is  a description 
of  a histogram-sorter  feasibility  chip  In  CCD 
which  was  built  for  the  Program. 


The  Sisart  Sensor  Project  Is  scheduled  to  last 
for  21  months  with  a key  circuit  selected  at  the 
one-year  mark  and  constructed  In  the  last  nine 
months. We  wanted  1°  select  a circuit  which  Is  com- 
mon to  as  many  of  the  algorithms  as  possible. 

Figure  1 shows  the  University  of  Maryland 
algorithms  and  the  functions  which  are  required  by 
each  plgorlthm.  A perusal  shows  that  the  histogram- 
sorter  function  occurs  In  four  out  of  the  five 
algorithms  and  Is  the  one  we  selected. 
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Several  versions  of  the  sorter,  burled  channel 
and  surface  channel  CCD,  were  put  In  production 
runs  at  Westlnghouse  Advanced  Technology  Labora- 
tory. Both  versions  assume  that  the  analog  signal 
has  already  been  quantized  Into  a thermometer  code. 
A physical  analogy  to  the  thermometer  code  Is  seen 
in  Figure  2 which  shows  a container  filled  with 
an  asiount  of  water  (charge) , proportional  to 
the  signal  voltage  S,  being  poured  Into  a 
tray  of  quantized  bins.  When  a bin  Is  filled  with 
water,  the  water  flows  over  the  top  Into  the  next 
bln.  The  volume  of  water  Is  divided  between  a 


number  of  discrete  bins. 


rigore  2.  Flow  Analogy  to  Charge  (Riantlzcr 


The  burled  channel  CCD  sorter  takes  the 
charge,  q,  residing  In  each  bln  and  receives  them 
In  narallel  as  In  Figure  3. 


Thus,  there  Is  q amount  of  charge  shifted  from  bln 
b.  to  channel  1^^,  q amount  of  charge  shifted  from 
bln,  b2  to  channel  I^  and  so  on.  By  means  of 
gates  pg_  pg^,  and  pg,,  the  contents  of  channels 
Ij^  through  I^  are  shifted  In  parallel  to  the  large 
holding  well  LHW.  The  large  holding  trell  Is  par- 
titioned Into  N channels  also.  Consider  a numer- 
ical example;  a sequence  of  numbers  4,  7,  S Is 
quantized  at  q - 1 so  that  4,  7,  and  S bins 
respectively  represent  each  number.  Then  Figure 
3 shows  the  sequence  as  It  goes  through  the 
quantizer,  the  b.,  b,,  ...b  bins  the  I,,  I.,  ... 

I channels  and  the  large  holding  well.^It  also 
snows  the  removal  sequence  from  the  large  holding 
well  and  the  remainder  at  each  stage.  The  numbers 
are  removed  In  order  of  decrer.'lng  magnltude(7,  5, 
4)which  shows  the  nusfcers  ha\'  been  sorted  by 
magnitude. 
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Flfur«  4.  Sorting  Sequence 


me  surface  channel  CtD  sorter  Is  shown  In  .'sure 
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--- 
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Figure  5.  Sorter  Fabrlcetlon 


Here  the  sorter  consists  of  Individually  con- 
trolled  CCD  shift  registers.  Each  register  Is 
enabled  by  a clock  pulse  (CP)  n,d  the  presence  of  , 
charge  quantum.  . The  lower  portion  of  Figure 
5 shows  the  nuisbers  3,  1,  5 and  2 entering  the 
sorter.  At  stage  2,  the  first  register  will  shift 
only.  At  stage  3,  all  five  registers  will  shift 
md  note  that  the  data  Is  then  arr«.ged  In  decen- 
dlng  order. 


We  had  been  using  surface  channel  CCD's  to 
produce  Image  prlmatlves  such  as  produced  by  the 
Median  Filter  and  Non-Maximum  Suppression  Operstors 


because  we  already  had  tab  data  on  their  perfor- 
mance. However,  we  analysed  the  histogram-sorter 
to  be  one  tenth  as  large  with  burled  channel  CCD 
devices,  so  we  amended  the  chip  feasibility  pro- 
gram to  Include  burled  channel  devices  also.  The 
second  version  of  the  sorter  was  done  with  surface 
channel  CCD's  and  the  first  version  was  in  burled 
channel.  The  buried  channel  device  was  achieved 
by  Ion  implantation  In  the  surface  channel 
structure  to  meet  the  time  schedule.  Probe  tests 
showed  that  the  yields  were  not  high  enough  to 
continue  processing  the  burled  channel  wafers  so 
they  were  dropped.  The  surface  channel  devices 
are  used  in  the  demonstration. 


° ■ “ ui  tne  ouriea  channel 

mechanical  assembly  will  be  available 
at  the  DARPA  meeting  In  mid  October,  with  the 
demonstration  scheduled  for  November.  One  portion 
of  the  demonstration  unit  Is  shown  In  Figure  7 
with  the  shift  registers  mounted  in  place.  The  ten 
shift  registers  are  seen  at  the  top  of  the  unit 
and  ten  thumbwheel  switches  jre  shown  below. 


Figure  6.  Burled 
Channai  Wafer 


These  thumbwheels  represent  the  unsorted  numbers 
which  the  sorter  must  rearrange  In  descending 
order.  The  observer  mav  dial  In  any  arrangemert 
of  numbers  which  he  wishes.  The  Inputs  and  outputs 
I.e.,  the  unsorted  and  sorted  arrangements  will  be 
shown  on  a two-trace  oscilloscope. 


Figure  7.  Demonstration  Unit 


CCD  IMPLEMENTATION  OF  AN  IMAGE  SEGMENTATION  ALGORITHM 
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ABSTRACT 

Under  contract  to  the  University  of  Maryland, 
Westlnghouse  has  been  implementing  algorithms  for 
use  in  the  target  cueing  process  on  the  focal 
plane  of  imaging  sensors.  The  program  is  spon- 
sored by  DARPA,  and  monitored  by  the  Army's  Night 
Vision  Laboratory.  It  has  resulted  in  a examina- 
tion of  the  latest  advances  in  CCD  technology  and 
led  to  the  design  of  innovative  structures  which 
require  very  small  chip  areas.  A CCD  implementa- 
tion of  an  image  segmentor  is  described  here. 


The  purpose  of  the  Connected  Components 
Algorithm  is  to  segment  an  image  frame  into  object 
regions;  these  object  regions  are  potential  shapes 
of  interest  and  features  are  extracted  from  them 
for  classification  purposes.  We  assume  that  Time 
Delay  Integration  is  part  of  focal  plane  signal 
processing  which  implies  that  the  image  comes  to 
the  cuer  in  the  form  of  one  line  at  a time,  i.e. 
the  pixels  in  one  line  of  image  arrive  in  parallel. 
The  Connected  Components  Operator  then  moves  along 
the  line  of  pixels,  with  the  previous  line  In  memo- 
ry, determining  which  pixels  are  part  of  a parti- 
cular object  region  or  if  a new  object  region  is 
starting.  If  we  are  to  extract  features  from  each 
object  region,  there  must  be  a means  for  disting- 
uishing between  different  object  regions.  One 
approach  to  the  problem  is  to  paint  (assign  an 
analog  voltage  level)  each  object  with  a different 
color  (analog  voltage  level)  and  then  have  a fea- 
ture extractor  assigned  to  each  color  (voltage 
level).  Where  an  object  has  several  colors,  the 
feature  extractors  corresponding  to  those  colors 
accumulate  their  features,  dung)  them  into  a 
scratch  feature  extractor  to  combine  them,  and 
reassign  the  results  to  one  of  the  two  feature 
extractors. 

Assume  that  the  original  image  has  been 
thresholded  and  the  result  is  In  binary  form  with 
gray  levels  exceeding  the  threshold  shotm  as  I's 
in  Figure  1.  One  image  line  is  retained  in  memory 
so  that  each  pixel  can  examine  its  neighbors  to 
the  left  and  also  above.  No  diagonal  connections 
are  permitted  under  this  convention,  and  an  adja- 
cent (horizontal  or  vertical)  pixel  must  be  occu- 
pied in  order  to  make  a connection.  No  skips  or 
gaps  are  allowed,  and  the  computations  start  one 
pixel  in  from  the  edge.  In  Figure  lb,  there  are 
four  distinct  regions  A,  B,  C,  and  D.  The  only 
possible  connection  between  regions  B and  C Is 
through  a diagonal  which  is  not  allowed.  Compu- 


tations for  the  fourth  row  are  seen  in  Figure  Ic. 
Here,  there  is  a connection  between  regions  B 
and  C and  an  equivalence  statement,  B ~ C,  is 
carried  along.  At  the  end  of  the  sixth  row,  there 
is  another  connection  between  C and  D (C  • D)  and 
all  che  regions  are  completed  as  seen  in  Figure 
Id. 

The  system  block  diagram  is  shown  in  Figure 
2.  The  pixels  are  read  from  the  top  of  the  Image 
to  the  bottom  from  left  to  right.  The  delay  line 
is  represented  by  twenty  (20)  SI/SO  CCD  delay 
lines  which  are  coded  to  obtain  100  colors  (ana- 
log voltages)  and  obviate  transfer  efficiency 
problems.  There  are  20  levels  of  color  compari- 
sons for  horizontal  and  vertical  connections  in 
the  Coloring  Operator.  The  Equivalence  Box  notes 
horizontal  and  vertical  connections  between 
different  colors,  recolors  a pixel  if  necessary, 
and  notes  when  a color  is  no  longer  being  used 
thus  activating  the  equivalence  statement  between 
two  different  colors.  The  column  clock  is 
actually  fed  to  all  the  Feature  Extractors  and 
they  indicate  when  a color  is  no  longer  used. 

The  device  which  selects  the  appropriate  Feature 
Extractor  is  a decoder.  The  Feature  Extractors 
which  accumulate  the  object  features  such  as  area 
and  perimeter  as  well  as  the  Scratch  Feature 
Extractor  form  the  basis  for  classification  deci- 
sions. 

The  desired  data  rate  is  I megapixel  / sec. 
and  to  achieve  this  with  surface  channel  CCD's 
with  an  assumed  rate  of  SO  - 100  KHz  requires 
Multiple  Connected  Component  Operators  in  the 
same  manner  as  multiple  Median  Filter,  Gradient, 
or  Non-Maximum  Suppression  Operators.  That  is, 
the  line  delay  would  be  divided  into  a number  of 
vertical  columns.  The  object  region  must  still 
be  constructed  across  columns  since  an  object 
region  may  be  100  pixels  or  A columns  wide.  One 
can  begin  to  envision  a hierarchy  or  branching 
structure  of  Connected  Component  Operators.  The 
point  here  is  that  with  the  primitive  operators, 
tre  were  able  to  break  the  image  into  colums  be- 
cause the  operators  were  relatively  Independent 
between  columns.  This  is  not  the  case  here  and 
it  appears  that  one  Connected  Component  Operator 
is  daslrable. 

A ramification  of  one  Connected  Component 
Operator  la  that  the.plxals  are  moving  through  the 
delay  line  at  the  rate  of  1 megaplxal/sec . To 
insure  nxmarical  integrity,  we  shall  organize  the 
decoder  taking  the  delay  line  output  in  the  form 
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of  a Field  Progranawble  Logic  Array.  The  color 
(analog  voltage)  la  quantized  to  ten  levels  and 
the  highest  bln  containing  a quantum  of  charge  Is 
Identified  and  shifted  Into  a delay  line  corres- 
ponding only  to  that  bln.  This  means  that  20 
delay  lines  and  a decoder  are  necessary  to  carry 
100  colors . 

We  assume  Chat  the  Coloring  Operator  processes 
a binary  Image,  l.e.  each  pixel  contains  either  a 
one  (1)  or  a zero  (0).  The  binary  data  stream 
will  enter  Che  Coloring  Operator  and  emerge  trans- 
formed Into  different  colors  or  signal  levels  for 
different  shapes.  Since  the  Image  data  Is  read 
out  serially,  the  Coloring  Operator  Is  a local 
operator.  The  Coloring  Operator  Is  a transforma- 
tion from  a binary  picture  to  a color  one  by  a 
mapping  T 

T(A,B,M):C  ♦ C^ 

where  C^  Is  Che  color  of  the  transformed  pixel  C, 
the  variables  A and  B represent  nearest  neighbors 
of  C,  and  M represents  an  available  color  the 
relative  locations  of  pixels  A,  B,  and  C In  the 
Image  plane  area  shown  below. 

I B 


We  define  the  coloring  window  as  always  containing 
these  three  elements.  Elements  A and  B are  nearest 
neighbors  of  C and  have  already  been  processed  by 
the  Operator.  Element  B Is  located  one  horizontal 
line  above  elements  C and  A.  Element  C Is  painted 
by  the  Coloring  Operator  according  to  the  following 
rule 

For  C ^ 0 

. fAifAd0B,<0 

C - i B If  A - 0,B  ^ 0 

I M If  A - 0,B  - 0. 

When  adjacent  elements  have  different  colors,  the 
element  being  painted  assumes  the  color  of  the 
nearest  neighbor  in  the  same  line  (rows  dominate). 
Whenever  elements  A and  B are  zero  and  element  C 
is  not  zero,  element  C^  Is  given  a new  color. 

To  Include  multicolor  capability,  we  want  to 
access  the  Equivalence  Operator  whenever 

For  Cl*0,  A)lBy0. 

Now,  we  face  problems  such  as  which  direction  to 
actuate  Che  equivalence  statement,  when  Co  actuate 
It,  and  how  to  facilitate  it. 


one  of  the  channels  of  the  large  holding  well 
found  In  the  Feature  Extractor  for  each  color.  At 
the  rate  that  Che  pixels  are  shifted  in  the  system 
a quantum  of  charge  would  be  entered  In  the  timer 
channel;  the  accumulated  charge  would  be  non-des- 
tructively  read  out  and  compared  to  say  600.  If 
the  accumulated  charge  were  larger  than  the  amount 
equivalent  to  600  shifts,  the  color  would  be 
closed  out.  If  there  were  a vertical  connection, 
the  contents  of  the  timer  channel  would  be  reset 
(if  <600)  to  zero,  and  the  accumulation  started 
again  for  that  color.  Note  that  each  color  has 
its  own  timer  and  each  is  updated  at  every  clock 
pulse.  That  component  color  which  is  closed  out 
is  directed  into  the  color  component  of  the 
equivalent  statement  which  is  not  closed  out. 

There  is  one  other  logic  step  in  implementing 
the  equivalence  statement  and  that  is  recolorlng 
the  previous  line  when  a vertical  connection 
between  different  colors  is  detected.  This  is 
necessary  to  form  a connection  between  a tri- 
colored  object  region,  for  example.  If  we  have 
the  equivalence  statements: 

1*3 

2+1. 

Such  an  object  will  be  treated  as  two  object 
regions.  Recoloring  the  previous  line  will  re- 
state the  equivalent  statements  as: 

3+1 

2+1. 

There  Is  a Feature  Extractor  corresponding  to 
each  color;  the  signals  from  the  Equivalence 
Operator  enable  the  particular  Feature  Extractor 
for  the  equivalence  computations.  They  also 
direct  which  Feature  Extractor  will  receive  the 
contents  of  the  Scratch  Feature  Extractor,  l.e, 
the  color  still  active.  The  Feature  Extractors 
themselves  are  visualized  as  a many-channeled , 
large  holding  well  which  follows  along  the  line 
of  this  histogram-sorter.  Each  channel  would 
correspond  to  a particular  feature  and  since  the 
features  are  linear,  they  would  simply  add  in  the 
Scratch  Feature  Extractor, 
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The  equivalence  statement  is  actuated  when 
one  of  the  component  colors  is  closed  out;in  order 
to  continue  a color  (and  not  close  it  out) , there 
must  be  a vertical  connection  somewhere  along  the 
next  image  line.  One  possibility  is  to  construct 
a timer  for  each  color  such  chat  a vertical  connec- 
tion would  reset  the  timer.  If  the  timer  were 
allowed  Co  reach  600  (image  is  600  pixels  long), 
it  would  enable  the  Equivalence  Operator  and  close 
out  the  color.  The  timer  could  take  the  form  of 
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Flfor*  1 c.  Coaputations  for  Fourth  Row 
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SUMMARY 

Demands  on  Image  processing  systems  have  been 
Increasing  as  digital  systems  have  been  developed 
to  support  <>{flclent,  signal-level  and  symbolic- 
level  proce^^lng,  and  this  Increase  In  demand  Is 
expected  to  accelerate  sharply  In  the  near  future. 
This  paper  reports  on  the  progress  by  CMU  and  CDC 
on  a joint  architectural  research  effort  to  develop 
a processor  concept  to  anticipate  future  Image 
processing  needs.  This  processor  Is  being  designed, 
using  state-of-the-art  technology  and  automated 
design  techniques,  as  an  add-on  for  general-purpose 
host  computers,  and  Includes  a number  of  functional 
units  (with  I/O  and  memory  considered  as  functional 
units).  The  program -configurable  structure  Is  In- 
strumental In  providing  both  high  parallelism  and 
pipelining  for  throughput  optimization.  In  order 
to  provide  the  highest  degree  of  system  Integrity 
for  efficient  computation,  hardware  and  software 
are  being  developed  concurrently.  This  Includes  a 
machine  organization,  a two-pass  assembler,  and  a 
s Imulator . 


INTRODUCTION 

It  Is  anticipated  that  current  capabilities  to 
process  images  for  military  applications  will  soon 
be  Inadequate.  These  Increasing  demands  for  Image 
processing  power  can  be  attributed  to  three  compo- 
nents of  change.  First,  more  Imagery  data  Is  being 
collected  to  produce  a greater  number  of  processed 
Images.  Second,  an  Increasing  fraction  of  collect- 
ed Imagery  data  Is  being  sensed  or  recorded  In 
digital  form,  and  finally,  greater  Intelligence  on 
the  part  of  the  Image  processing  system  Is  being 
demanded.  These  changes  will  require  Improvements 
of  Image  processing  systems  In  both  signal-level 
processing  and  symbolic -level  processing. 

Signal- level  processing  Is  largely  a matter  of 
arithmetic  manipulations,  comprised  of  casks  such 
as  transformation.  Interpolation  , and  filtering. 
These  algnal-level  casks  are  readily  supported  by 
currently  available  numeric,  signal  processors.  To 
some  degree,  Incresslng  signal-level  demands  can  be 
met  by  brute- force  Increases  In  computer  power  with 
nctwrks  of  svailable  processors.  Unfortunately, 
Increases  in  the  numbers  of  digital  imagery  sensors 
and  iieage  users  tend  to  suggest  that  processing 
demands  should  Increase  by  two  orders  of  magnitude 


over  the  next  few  years.  Meeting  such  an  Increase 
will  have  to  be  met  by  both  an  Increase  In  signal- 
level  performance  and  a shift  of  some  of  Che  bur- 
den from  the  signal-level  to  the  symbolic-level. 

Symbolic-level  processing  Is  concerned  with  the 
detection  of  and  relationships  among  entitles 
having  certain  semantic  attributes,  and  Involves 
symbol  manipulating  tasks  such  as  searching,  deri- 
sion making,  and  path  finding.  Available  numeri- 
cally oriented  processors  are  poorly  suited  for 
these  casks.  This  is  particularly  true  of  high 
performance,  signal  processors  which  rely  heavily 
on  pipelining,  since  their  pipelines  generally  have 
Co  be  flushed  and  refilled  with  each  decision 
branch.  While  many  non-numeric  processors  have 
been  proposed,  and  a few  built,  there  are  two  im- 
portant disadvantages  to  attempting  to  satisfy  In- 
creasing image  processing  demands  through  a network 
of  numeric  and  non-numeric  processors.  First,  Che 
net  ould  be  Inhomogeneous,  leading  to  some  prob- 
lems in  hardware  maintenance  and  probably  severe 
problems  In  Interfacing  and  software  development 
and  software  change.  The  second  disadvantage  Is 
major  and  relates  to  Che  difficulty  of  smoothly 
shifting  Che  processing  burden  from  the  signal- 
level  to  Che  symbolic -level. 

The  preceedlng  observations  suggest  a need  for 
a processor  with  both  Improved  signal-level  comput- 
ing power  and  high  performance  at  the  symbolic- 
level  of  Image  processing.  In  the  paragraphs  which 
follow,  the  current  status  of  an  architectural 
research  project  performed  jointly  by  Control  Data 
Corporation  (CDC)  and  Camegle-Mellon  University 
(CMU)  to  develop  such  a processor  Is  described. 

Thus  far,  a preliminary,  block- level  design  has 
been  adopted  for  the  machine  organization.  In 
addition,  software  tools  are  being  developed  to 
support  the  design  work  and  applications  develop- 
ments. Initial  versions  of  an  asseiid>ler  and  a 
regia  ter- level  slimilator  have  been  constructed. 
These  tools  will  be  used  to  evaluate  the  perform- 
ance of  alternative  hardware  configurations. 

GENERAL  CONSIDERATIONS 

In  order  to  focus  the  architectural  research 
effort  on  functional  capabilities  for  a next  gener- 
ation Image  processor,  the  assumed  application  of 
the  processor  has  been  as  an  add-on  to  a general- 
purpose  host  computer,  as  illustrated  by  Figure  1. 
This  environmental  restriction  not  only  allows  a 
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Figure  1.  Connection  to  Host  Computer 


more  productive  hardware  design  effort,  but  sub- 
stantially simplifies  Che  simulator  and  software 
development.  Little  generality  Is  lost  through 
Che  restriction  since  augmenting  Che  Inpuc/Oucpuc 
performance  for  network  applications  will  require 
modification  of  only  two  of  ten  substructures  of 
Che  processor.  The  I/O  capability  has  been  limited 
to  supporting  buffered  block  transfers  between  host 
and  processor  memories,  and  no  Interrupt  facilities 
have  been  provided.  Other  constraints  which  have 
been  adopted  In  keeping  with  Che  goal  of  high 
signal-level  and  symbolic -level  processing  perform- 
ance Include  a conmltment  Co  subnanosecond,  ECL  LSI 
technology  and  16-blt  Integer  words,  with  multiple 
precision  and  byte  oriented  capabilities.  (No 
floating-point  hardware  Is  Included.) 

The  main  components  of  Che  processor  at  Che 
block  level  are  shown  In  Figure  2,  and  Include  1) 
a set  of  function  units,  and  2)  Instruction  memory 
and  associated  control  logic.  The  processor  oper- 
ates In  a sychronous  manner  with  a basic  cycle  time 
of  15-20  nanoseconds.  Instructions  are  fetched 
from  Che  Instruction  memory  at  this  rate  and  func- 
tional unit  execution  proceeds  at  the  same  rate. 

In  most  cases,  two  cycles  are  required  to  complete 
an  Instruction;  during  the  first  cycle  the  operands 
are  fetched,  and  execution  of  the  function  occurs 
during  Che  second  cycle. 


A collection  of  relatively  autonomous  function- 
al units  implement  Che  arithmetic  and  logical  func- 
tions of  the  machine.  Each  unit  may  execute  a 
simple  operation  such  as  an  add  or  a more  complex 
operation  which  may  take  many  cycles  to  complete. 
Most  of  Che  functional  units  have  two  Inputs  and 
one  output,  as  well  as  Inputs  for  the  control  of 
Internal  logic.  The  Input  operands  are  latched  In 
local  registers;  this  allows  temporary  results  to 
be  held  as  Input  operands  for  the  next  function 
without  having  to  store  them  In  data  memory.  Data 
memory  Is,  In  fact,  controlled  like  any  other  func- 
tional unit. 

Functional  units  Included  In  the  machine  are: 
two  adders,  a multiplier,  a logical  unit,  a barrel 
shifter/mask  unit,  an  I/O  box,  and  the  control 
unit.  In  addition  Co  appropriate  operand  regis- 
ters, each  functional  unit  has  hardware  to  compare 
the  result  of  an  operation  with  a previously  de- 
fined quantity  held  In  an  auxiliary  register  In  the 
unit.  The  results  of  these  comparisons  are  avail- 
able at  all  times  and  may  be  sensed  by  Che  control 
unit  for  use  In  conditional  execution  of  Inscruc- 
t Ions . 

CONTROL 

The  Instruction  memory  Is  loaded  by  DMA  trans- 
fer from  the  host  computer.  The  Instructions  are 
subdivided  Into  several  fields:  four  fields  for 
controlling  four  different  functional  units,  a 
constant  field  which  doubles  as  Che  destination 
address  for  a branch,  and  a field  which  controls 
the  processor's  data  paths.  Any  Instruction  may 
Initiate  activity  In  four  functional  units  simul- 
taneously, allowing  operand  fetches  to  overlap  Che 
execution  of  functions. 
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Figure  2.  Preliminary  Block  Diagram 
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The  execution  of  Instructions  Is  normslly  se- 
quentlsl.  This  msy  be  altered  by  executing  an 
appropriate  function  In  a special  functional  unit 
called  the  control  unit.  The  control  unit  Is  ad- 
dressed the  sane  as  the  other  functional  units, 
thus  a branch  Instruction  Is  performed  by  placing 
the  proper  code  in  one  of  the  functional  unit 
fields  of  the  Instruction.  .This  unit  also  takes 
care  of  special  Instructions  (e.g.  HALT). 

Instruction  memory  and  data  memory  are  both 
loaded  via  DMA  transfer  controlled  by  the  I/O  func- 
tional unit.  The  control  registers  in  the  I/O  box 
may  be  loaded  by  either  the  processor  or  the  host 
computer.  This  allows  the  host  to  load  a program 
Into  the  processor  and  start  It.  This  also  allows 
a program  executing  In  the  processor  to  overlay  It- 
self. 

SIGNAL-LEVEL  PROCESSING  FACILITIES 

Low-level  Image  processing  algorithms  fre- 
quently contain  short  Inner  loops  which  must  be  ex- 
ecuted for  each  pixel.  In  these  cases,  memory 
bandwidth  limitations  may  dominate  the  total  execu- 
tion time  for  the  algorithm.  This  problem  Is 
avoided  by  the  fast  data  memories  Incorporated  In 
the  processor  design.  The  presence  of  two  Indepen- 
dent memories  allows  the  processor  to  acquire  two 
new  operands  on  each  cycle,  or  to  store  a result 
and  acquire  an  operand.  The  host  machine  may  also 
be  loading  or  retrieving  data  from  these  memories 
as  the  processor  Is  running.  Since  an  Intermediate 
result  can  be  held  In  the  Input  register  of  a func- 
tional unit  for  the  next  operation,  the  number .of 
memory  accesses  Is  reduced.  By  chaining  operations 
In  this  fashion.  It  should  be  possible  to  execute 
many  Instructions  between  memory  references. 

SYMBOLIC-LEVEL  PROCESSING  FACILITIES 

The  processor  design  is  Intended  to  provide  an 
effective  level  of  performance  for  high-level  pro- 
cesses such  as  those  encountered  In  symbolic  manip- 
ulation tasks  which  require  different  capabilities. 
Here  algorithms  are  more  complex  and  usually  re- 
quire a considerable  amount  of  declslon-maklng.  A 
number  of  the  design  features  allow  the  processor 
to  execute  these  tasks  efficiently. 

''Miltlway  branches  can  be  executed  in  one  in- 
struction, easing  the  programnlng  of  algorithms 
having  branches  to  one  of  many  paths  (such  as  for 
speech  recognition,  and  graph  searches).  Symbolic 
processing  algorithms  may  employ  data  which  Is 
packed  in  units  other  than  16  bits;  the  barrel 
shlft/mask  unit  allows  an  arbitrary  field  from  a 
data  word  to  be  extracted  and  shifted  to  any  posi- 
tion In  one  Instruction.  For  example,  a list  may 
be  composed  of  a series  of  16-blt  words  containing 
an  ASCII  character  In  the  high  order  byte  and  a 
pointer  to  the  next  element  In  the  list  In  the  low 
order  byte.  The  barrel  shift/mask  unit  could  re- 
trieve the  ASCII  character,  mask  off  the  parity  bit 
and  right  Justify  the  character  in  one  Instruction. 
Slmllarlly,  an  arbitrary  bit  pattern  (e.g.  101) 
could  be  loaded  Into  the  comparison  register  of  the 
barrel  shift/mask  unit  and  a sequence  of  words  then 
searched  for  that  pattern. 


Multiple  Index  registers  (32)  In  the  data  mem- 
ories allow  pointers  to  be  available  for  several 
records  at  once.  These  Index  registers  may  also 
be  Incremented  or  decremented  In  one  cycle,  facil- 
itating sequential  data  accesses  or  stack  opera- 
tions. The  dual  memories  allow  rapid  context 
swapping,  and  the  I/O  structure  Is  such  that  a new 
context  could  be  loaded  from  the  host's  memory 
while  processing  Is  occurring  In  the  other  context, 

LOGIC  COMPONENT  TECHNOLOGY 

The  logic  component  technology  for  this  proces- 
sor, selected  on  the  basis  of  performance  and  cost- 
effectiveness,  Is  an  emitter-coupled  circuit 
developed  by  Control  Data  In  conjunction  with  chip 
fabrication  performed  by  Motorola  and  Fairchild. 

The  Integrated  circuit  chip  characteristics  are 
listed  In  Table  1.  The  fabrication  process  pro- 
vides a semi-custom  revision  of  LSI.  With  this 
process  the  cost  of  fabricating  new  chip  types  Is 
much  lower  than  with  the  custom  LSI.  In  this 
technology  the  diffusion  pattern  Is  fixed  and  Is 
the  same  for  each  chip  type,  A two- layer  metal- 
Izatlon  Interconnect  Is  used  to  provide  the  vari- 
able structure.  Features  of  the  printed  circuit 
board  construction  are  listed  In  Table  2. 


TABLE  1.  ECL  LSI  ARRAY  DESCRIPTION 


0 

168  ECL  Gates  Per  Array 

o 

2 Input  And-Nand  Gates 

o 

165  x 175  Mil  Die 

o 

4 Gates  Per  Cell 

o 

48  Signal  Pins 

o 

External  Gates 

o 

6 Loads  Per  Output 

o 

Collector  Dotting 

o 

Emitter  "And" 

0 

Subnanosecond  Gate  Delay 

o 

1-5  Watts 

TABLE  2.  PRINTED  CIRCUIT  BOARD 

0 

10  X 12.5  Inches 

0 

150  Arrays  Per  Board 

0 

15  Layer  Boards — 6 Signal  Layers, 

2 Power  Layers 

o 

Burled  Layer — Termination  Resistors 

o 

2100  Signal  Connections  Available 

A key  element  of  this  technology  Is  that  the 
machine  is  completely  simulated  at  the  gate  level 
prior  to  fabrication.  The  effectiveness  of  this 
approach  has  been  demonstrated  In  a test  bed  pro- 
ject In  which  checkout  time  following  assend>ly  was 
reduced  to  a few  weeks.  This  contrasts  greatly 
with  the  normal  checkout  of  a new  computer  which 
can  take  several  months  (If  not  years) . Features 
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of  the  slnulatlon  tools  are  listed  in  Table  3. 


TABLE  4.  ASSEMBLER  FEATURES 


TABLE  3.  AUTOMATED  DESIGN  TECHNIQUES 


0 Logic  Slnulatlon 

- 75,000  - 100,000  Gates 

- 24  Array  Types 
• 10  Boards 

- Gate,  Foil,  Coax  Delays  to  10  PS 

- Worst  Case  Variation 

o Board  Router 

- Cyclical 

- Interactive 

- Photoplot  Tape 

- Simulator  Input 

o Array  Generator 

- Digitized  Array 

- Spacing  Checks 

- Mask  Generation 

- Simulator  Input 

Layout 

Tape 

ASSEMBLER 


The  assembler  runs  on  a CDC  6400  computer  In 
batch  mode,  accepting  standard  80-coluinn  punched 
cards  as  input.  It  produces  an  object  code  file 
as  well  as  a 132-coIutnn  line  printer  assembly  list- 
ing. The  assembly  listing  contains  informative 
statistics,  a listing  of  the  cards  input,  object 
code  produced,  and  any  error  messages.  In  order 
to  facilitate  rapid  debugging,  the  syntax  analyzer 
and  code  generator  tag  the  offending  token  when  an 
error  is  detected.  Key  features  of  the  assembler 
are  listed  in  Table  4. 

The  assembler  translates  a set  of  symbolic  in- 
structions into  the  appropriate  machine  instruc- 
tions, A program  in  the  assembly  language  is  a 
number  (I  or  more)  of  instructions  and  assembler 
directives  terminated  by  an  "END"  directive.  Each 
instruction  Ir  composed  of  an  optional  label 
followed  by  one  to  four  statements  and  terminated 
by  a semicolon.  A statement  defines  an  operation 
for  a specific  functional  unit  and  generally  takes 
the  form  of  an  op  code  mnemonic  followed  by  a 
number  (possibly  zero)  of  operands  separated  by 
commas.  Each  statement  corresponds  to  one  of  the 
15-bit  functional  unit  fields  in  the  machine 
instruction. 

Input  text  is  free- format;  instructions  may 
cover  several  lines  and  spaces  and  tabs  may  be  used 
freely  to  improve  the  readability  of  the  code.  The 
syntatlc  form  of  the  processor  prograonlng  language 
has  been  developed  using  a metalanguage  with  the 
following  symbols.  A class  of  objects  is  denoted 
by  a word  (usually  descriptive)  enclosed  by  angle 
brackets  ("<  " ">  ") • When  such  a class  (e.g. 
<label>)  Is  used,  any  one  of  the  members  of  that 
class  may  be  substituted  for  the  occurrence  of  that 
class.  Classes  enclosed  in  square  brackets 
"]'*)  are  optional  - a member  of  the  class  may  be 


o Written  In  Fortran  For  Portability 

o Modularly  Written  For  Ease  Of  Hodiflcation 
As  The  Processor  Design  Changes 

o Instruction  Set  Changes  Implemented  By 
Changing  Only  One  Function  Within  The 
Assembler 

o Language  Format  (Syntax)  Changes  Imple- 
mented By  Changing  A Set  Of  Productions 
And  Recompiling  The  Assembler 

o Machine  Configuration  Easily  Changed 

o Input  Text  To  The  Assembler  Is  Completely 
Free  Field:  Banks,  Coinnents,  And  Blank 
Lines  May  Be  Mixed  With  Code  To  Improve 
Readability 

o One  To  Four  Op  Codes  Allowed  Per 
Instruction 

o Interconnect  Routing  Set  Up  Automat- 
ically By  Assembler 

o Multiway  Branch  Blocks  Automatically 
Moved  To  Power  Of  Two  Boundaries 

o Errors  Individually  Flagged  And  Message 
Printed 

o Complete  Assembly  Listing  Printed  At 
Completion  Of  Assembly,  Including  A 
Symbol  Table  Dump  And  All  Code  Generated 

o Several  Informative  Statistics  Are 

Maintained  During  Program  Assembly  And 
Printed  At  The  End  Of  The  Assembly 
Listing 


present  at  that  position  but  its  presence  is  not 
mandatory.  Using  this  notation  an  instruction 
could  be  represented  as: 

[<label>]<statement>[<statement>3 
[<3tatement>][<Statement>] ; 

where  <label>  is  any  non-reserved  word  followed 

by  a colon. 

The  assembler  generates  code  in  two  passes.  On 
the  first  pass  (through  the  input  deck)  syntax 
errors  are  detected,  statement  fields  are  specified, 
some  constant  fields  are  assigned,  labels  are  de- 
fined and  a data  path  requirpment  Is  generated  for 
each  complete  instruction. 

The  assembler  makes  the  second  pass  through  the 
code  generated  on  the  first  pass  after  an  END  or 
ENDC  Is  encountered.  Constants  which  were  unspec- 
ified after  the  first  pass  (e.g.  forward  branches) 
are  assigned  and  the  data  path  control  field  bits 
are  set.  At  the  end  of  the  second  pass  the  code 
generated  Is  punched  as  an  output  deck.  If  an  ENDC 
was  the  last  directive,  the  symbol  table  and  code 
generator  are  Initialized  and  the  assembler  con- 
tinues to  read  Input,  otherwise  execution  termi- 
nates. 


} 

i 

- 
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The  assembly  example  in  Table  S (follows  text) 
Is  Intended  to  provide  some  Insight  Into  the  type 
of  progranmlng  likely  to  be  encountered  with  the 
processor.  The  program  produces  a summation  of  the 
Integers  from  one  to  ten. 

The  code  uses  both  adders  to  sum  the  Integers 
from  one  to  ten.  Adder  1 Is  used  to  provide  each 
Input  Integer  to  adder  2 which  calculates  the  run- 
ning sum.  10  clears  both  operand  registers  of  both 
adders.  II  latches  a 1 In  the  B operand  Input 
register  of  adder  1;  this  1 Is  used  to  Increment 
the  current  Integer  (located  In  the  A operand  reg- 
ister) to  find  the  next  Integer.  10  and  II  could 
not  be  combined  Into  "ADDl  0.1  , A0D2  0,0;"  since 
this  would  require  two  different  constants.  12 
loads  the  terminating  condition  Into  the  compare 
register  of  adder  I.  During  13  the  new  Integer 
(output  of  adder  1)  and  the  last  sum  (output  of 
adder  2)  are  added  In  adder  2.  Simultaneously  the 
Integer  Is  Incremented  In  adder  1.  14  tests  the 

terminating  condition;  adder  1 contains  the  next 
Integer  to  be  added,  hence  the  test  agslnst  11 
rather  than  10.  15  Is  always  executed  because 

Instruction  fetches  overlap  Instruction  execution. 
At  the  end  of  IS  either  13  or  16  Is  executed,  de- 
pending on  whether  the  tenth  Integer  has  been 
added. 

SIMDIATOR 

The  CDC/CMU  processor  simulator  Is  a software 
package  designed  to  execute  object  code  Intended 
for  the  processor.  The  register  level  operations 
and  results  are  Identical  to  the  hardware  version. 
During  the  execution  of  Che  program  a number  of 
informative  aCatlatlcs  are  gathered.  Table  6 lists 
major  features  of  the  sisulacor. 

TABLE  6.  SIMUIATOR  FEATURES 


o Written  In  Fortran  For  Portability 

o Modularly  Written  For  Ease  Of  Modification 

o Register  Level  Operations  Identical  To 
Those  Of  The  Processor 

o Load  File  Format  Identical  To  That  Of  The 
Processor 

o Run  Time  Statistics  Maintained  Include 

The  PurcenC  Utilization  Of  Each  Functional 
Unit,  Number  Of  Branches  Executed,  And 
Total  Simulation  Time 

o Simulator  Easily  Reconfigured  To  Reflect 
Hardware  Modifications 

o Functional  Unit  Modules  Follow  A Standard 
Format,  Allowing  New  Functions  To  Be  Added 
To  The  Simulator  With  A Minimum  Of  Effort 

o Break  Point  Feature  Allows  The  State  Of 
The  Machine  To  Be  Printed  At  User  Selected 
Points  During  Program  Execution 


accessing  the  condition  codes.  Internally  the 
units  are  similar.  The  op  code  passed  to  Che 
start  function  Is  checked  for  clock  operand  bits 
and  the  requisite  operands  are  saved.  Calling  the 
finish  function  causes  the  result  of  Che  operation 
Co  be  presented  at  the  appropriate  input  and  the 
condition  codes  Co  be  set. 

The  simulator  produces  a listing  which  Includes 
break-point  printout  and  a statistics  summary. 
Simulation  of  a program  Co  sum  Che  first  ten 
Integers  resulted  in  Che  simulation  listing  of 
Table  7 (follows  text). 
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Each  software  functional  unit  Is  a separate 
module,  containing  a "start  operation"  function,  a 
"conclude  operation"  function,  and  a function  for 
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TABLE  5.  ASSEMBLY  EXAMPLE  ^ 


BEBIM  PARSE  AT  U.2S.09. 

V PROGRAM  TO  ADO  THE  INTEGERS  FROM  1 to  10 


START  0 ^ 

10:  ADD1  0.0  A0D2 

II:  A001  .1; 

12:  LCMP  A00ER1.13; 

13:  A002  ADDER1,A0DER2 

AD01  ADDER  1,: 

14:  BRNE  ADDER1.I3: 

IS:  NOOP; 

16:  HALT; 

ENDC 

END  OF  PARSE 


V BEGIN  EXECUTION  AT  LOCATION  0 

.0;  V CLEAR  BOTH  ADDERS 

V LOAD  A 1 FOR  INCREMENT 

V STOP  WHEN  HE  REACH  10 

V ACCUMULATE  THE  SUM 

V INCREMENT 

V BRANCH  BACK  IF  NOT  DONE 

V ALWAYS  EXECUTED 

V ADDER2  CONTAINS  THE  TOTAL 


SYMBOL  TABLE: 


SYMBOL 

ADDRESS 

10 

0000 

11 

0001 

12 

0002 

13 

0003 

14 

0004 

IS 

OOOS 

16 

0006 

CODE  GENERATED: 


PC 

FUO 

FU1 

FU2 

FU3 

0000 

00061 

00062 

00000 

00000 

0001 

00041 

00000 

00000 

00000 

0002 

00101 

00000 

00000 

00000 

0003 

00062 

00021 

00000 

00000 

0004 

04100 

00000 

00000 

00000 

0005 

00440 

00000 

00000 

00000 

0006 

01040 

00000 

00000 

00000 

CCNST  INTERCONNECTS  REQUIRED 


000000 

77 

17 

17 

17 

17 

77 

77 

77 

000001 

77 

77 

17 

77 

77 

77 

77 

77 

000013 

77 

17 

77 

77 

77 

77 

77 

77 

777776 

77 

01 

77 

01 

02 

7' 

77 

77 

000003 

77 

77 

77 

17 

77 

77 

77 

77 

777776 

77 

77 

77 

77 

77 

77 

77 

77 

777776 

77 

77 

77 

77 

77 

77 

77 

77 

START  ADDRESS:  00000 


INSTRUCTIONS  GENERATED: 


INST  FIELDS  USED: 


of  28  CONSTANTS  USED:  4 of  7 


TOTAL  ASSEMBLY  TIME:  .22  SEC 


ASSEMBLY  COMPLETE  AT  17.25.09. 
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TABLE  7.  SIMULATION  PROGRAM 


BEGIN  PARSE  AT  08.40.4S. 


V PROGRAM  TO  ADO  THE  INTEGERS  FROM  1 to  10 


START  0 

10: 

ADD1  0.0  A0D2  0,0: 

V 

CLEAR  BOTH  ADDERS 

11: 

AD01  .1; 

V 

LOAD  A 1 FOR  INCREMENT 

12: 

LCMP  ADDER1.12; 

V 

STOP  WHEN  WE  REACH  10 

13: 

BRNE  A0DER1.I3: 

V 

BRANCH  BACK  IF  NOT  DONE 

14: 

ADD2  A00ER1,A0DER2 

V 

ACCUMULATE  SUM 

ADD1  ADDER1.; 

V 

INCREMENT 

IS: 

HALT: 

V 

A0DER2  CONTAINS  TOTAL 

ENDC 

END  OF  PARSE 
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1.  The  Texture  Problem 

Texture  is  one  of  many  cues  available  for 
identification  of  objects  in  scenes.  Other  cues 
are  often  sufficient  for  recognition:  in  a color 
image,  the  difference  between  sky  and  forest  is 
most  effectively  made  by  describing  principle  hue. 
Sky  is  often  "mostly  blue  and  white"  while  forest 
is  "mostly  green  and  brown."  Sometimes  the  dis- 
tinction between  areas  is  more  subtle,  such  as 
the  difference  between  a green  lawn  and  green 
bushes.  Here,  the  set  of  picture  elements  (pixels) 
that  make  up  the  lawn  may  be  identical  to  the  set 
of  pixels  that  make  up  the  bushes.  The  only 
difference  between  the  two  areas  is  the  placement 
of  the  various  pixels.  The  lawn  and  bushes 
present  different  two-dimensional  patterns,  and 
these  kinds  of  patterns  are  what  we  mean  by 
texture. 

Cmputers  deal  with  sampled  images.  A 
sampled  image  is  a matrix  of  pixels.  The  data 
associated  with  a pixel  may  include  intensity, 
hue,  and  saturation,  as  well  as  data  outside  of 
the  visible  spectrum  such  as  infra-red  or  ultra- 
violet components,  and  information  from  other 
sources,  like  range  data  (describing  distance  from 
the  observer).  When  data  from  more  than  one  wave- 
length per  pixel  is  available,  the  image  is 
called  multi -spectral . 

Image  segmentation  is  the  division  of  an 
image  into  semantically  meaningful  regions,  or 
segments.  Image  segmentation  procedures  have  been 
most  successful  in  multi -spectral  images,  where 
many  cues  are  available.  Most  decisions  can  be 
made  by  looking  at  simple  statistical  measures  of 
pixels  in  an  area.  Average  intensity  separates 
dark  areas  from  light  areas.  Principal  hue 
separates  areas  of  different  colors. 

Because  most  objects  in  natural  scenes  differ 
from  their  neighbors  in  simple  spectral  measures, 
the  use  of  texture  as  a cue  has  been  neglected. 

All  image  segmentation  projects  include  some  kind 
of  measure  of  textural  properties  and  describe  how 
they  would  make  use  of  a better  textural  measure 
if  they  had  one.  A standard  approach  is  to  try  to 
quantify  coarseness.  Edge  per  unit  area  is  one 
such  measure,  used  by  [Ohiander,  1975]  among 
others.  This  is  an  intuitively  satisfying  measure, 
and  when  used  with  other  cues  it  has  been  shown 
to  produce  good  results.  Another  approach  has  been 
to  look  at  neighborhood  adjacency  matrices,  and 
characterize  textures  by  statistical  properties  of 
these  matrices. 


When  the  number  of  cues  available  in  a scene 
is  reduced,  the  importa'ice  of  the  textural  cue 
increases.  Aerial  photographs  and  satellite 
images  are  prime  examples  of  images  where  texture 
plays  a key  role  in  terrain  classification.  Often 
the  textural  difference  is  more  subtle  than  a 
simple  measure  of  coarseness.  Other  measures 
range  from  simple  operations  carried  out  in  a 
small  neighborhood,  to  Fourier  techniques  which 
transform  the  image  into  a frequency-space 
representation.  None  of  these  techniques  provides 
a very  satisfying  (that  is,  successful)  texture 
measure. 

2.  A Region-Based  Texture  Model 

Any  texture  measure  is  inherently  a statisti- 
cal one.  Statistics  can  be  thought  of  as  the 
science  which  allows  optimal  prediction  of  the 
true  state  of  nature  from  several  possible  unknom 
states.  In  texture  discrimination,  one  of  two 
states  of  nature  is  true:  either  two  textures  are 
the  same,  or  they  are  different.  In  texture 
classification,  one  of  many  possible  textures  is 
present  at  a particular  area  of  an  image,  and  the 
problem  is  to  choose  the  appropriate  one.  Decision 
theory  shows  how  to  make  the  best  choice,  and 
determine  a confidence  level  for  that  choice, 
when  given  appropriate  statistical  measures.  Four 
textures  difficult  to  discriminate  are  given  in 
Figure  1. 

The  problem  with  statistical  analysis  is 
that  if  an  inappropriate  set  of  statistical 
measures  is  used,  the  final  results  are  meaning- 
less. For  this  reason,  it  is  important  to  base 
statistics  on  a reasonable  model  of  the  phenomena 
to  be  measured.  In  the  case  of  natural  images, 
it  is  unreasonable  to  attempt  to  derive  meaning- 
ful measurements  from  statistics  based  on  features 
of  individual  pixels.  The  pixel  is  a unit  inti- 
mately tied  to  the  resolution  at  which  an  image 
has  been  scanned.  The  same  texture  scanned  at  two 
different  resolutions  (or  equivalently,  viewed 
from  two  different  distances)  should  produce  the 
same  description,  with  the  exception  of  factor  of 
scale.  Statistics  over  pixels,  such  as  adjacency 
matrix  calculations,  actually  give  worse  results 
at  better  (higher)  resolution.  At  high  resolution, 
a pixel  is  likely  to  be  surrounded  by  pixels  of 
similar  value,  except  at  edges,  so  that  neighbor- 
hood statistics  provide  an  expensive  edge 
operator.  The  neighborhood  measures  were  designed 
to  provide  information  about  spatial  relationships 
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among  pattern  elements  In  textures.  Julesz  C1975] 
proposes  that  patterns  made  up  of  random  dots  can 
be  described  by  enumerating  the  probability  of 
finding  an  element  of  Intensity  "1"  at  distance 
"r"  and  angle  "theta"  from  another  element  of 
Intensity  "j",  for  all  1,  j,  r,  theta.  By  limiting 
the  allowable  distance  to  1 for  computational 
efficiency,  almost  all  spatial  Information  Is  lost. 
In  any  case,  Julesz  was  referring  to  patterns  made 
up  of  uniform  elements.  In  dealing  with  natural 
textures,  the  appropriate  "pattern  element"  Is 
not  easy  to  define,  and  exact  boundaries  for 
elements  are  difficult  to  produce. 

We  have  generalized  the  Idea  of  spatial  rela- 
tionships In  textured  Images  by  dealing  with 
clusters  of  pixels,  called  "texture  primitives." 
The  texture  primitives  are  restricted  to  clusters 
of  "simple"  shapes,  which  can  be  approximated  by 
convex  polygons.  This  restriction  not  only 
finesses  the  hard  problem  of  complex  shape  des- 
cription, but  also  provides  a better  basis  for 
textural  description.  For  example,  a pattern  made 
up  of  "T"  shaped  clusters  Is  not  greatly  altered 
If  the  horizontal  and  vertical  bars  of  the  "T" 
are  disconnected.  This  similarity  is  immediately 
revealed  by  representing  the  basic  "T"  element  as 
two  long  elements  with  orthogonal  axes,  which  are 
closest  at  the  midpoint  of  one  element  and  an 
endpoint  of  the  other;  the  only  difference  lies 
In  the  exact  distance  between  the  elements.  In 
natural  textures,  there  is  usually  no  unique 
pattern  element,  and  important  features,  like 
orientation,  can  be  successfully  characterized  by 
statistics  of  the  individual  texture  primitives. 

Our  strategy,  then,  is  to  break  an  image 
down  into  texture  primitives,  examine  local  shape 
properties  of  these  primitives,  and  examine  a 
restricted  set  of  spatial  relationships  among 
groups  of  these  primitives.  While  our  clustering 
strategy  for  producing  texture  primitives  is 
limited  here  to  a criterion  of  intensity  close- 
ness, the  idea  of  the  texture  primitive  is  not 
limited  to  proximal  intensity  grouping.  A texture 
whose  basic  elements  are  themselves  textured  can 
easily  be  included  by  using  an  appropriate  texture 
measure  to  form  texture  primitives.  This  process 
does  not  lead  us  into  any  problems  of  infinite 
recursion  because  by  the  time  a third  level  of 
indirection  occurs  (textured  elements  forming 
different  textures  which  form  elements  for  yet 
another  texture),  the  patches  which  would  have 
formed  the  basis  for  the  highest-level  texture 
may  be  considered  objects  in  the  image. 

3.  Local  Properties  of  Regions 

Information  about  textural  orientation  and 
scale  could  be  captured  using  edge  data.  In 
general,  edge  analysis  is  a dual  to  region  des- 
cription: every  region  can  be  described  by  a set 
of  edges,  and  every  edge  can  be  described  as  the 
boundary  between  two  regions.  The  use  of  a region- 
al representation  is  chosen  because  it  is  a more 
reasonable  model  of  texture.  Regions  facilitate 
detection  of  classes  of  similar  elements,  as  well 
as  spatial  relationships  among  elements. 

Local  analysis  refers  to  measurements  made 
on  features  of  the  texture  primitives  themselves. 
Textures  made  up  of  different  primitive  elements 


arranged  in  the  same  spatial  order  will  appear 
different,  with  the  degree  of  difference  depending 
on  the  differences  between  primitives.  Two  kinds 
of  features  are  Important  for  classifying  primi-- 
tive  elements.  First  Is  the  similarity  along  the 
measure  used  to  cluster  the  pixels;  In  the  case  of 
monochrome  Images,  this  Is  simply  average  Inten- 
sity. Second  Is  a description  of  shape  features. 

The  Important  shape  measures  for  the  texture 
primitives  are  eccentricity  and  axial  orientation. 
The  eccentricity  of  a shape  Is  a ratio  of  major 
to  minor  axis  (computed  as  the  principal  axes  of 
Inertia).  Every  shape  with  non-zero  eccentricity 
has  Its  major  axis  as  Its  orientation.  Eccentricity 
and  orientation  are  scale-invariant,  and  a measure 
of  the  distribution  of  region  sizes  Is  used  to 
provide  scaling  Information. 

Many  textures  can  be  discriminated  using 
only  these  kinds  of  local  measurements.  For 
example,  a texture  made  up  of  long,  thin  regions 
at  a particular  orientation  Is  Immediately  dis- 
tinguished from  textures  with  no  prevalent  orien- 
tation, or  from  textures  with  no  long,  thin 
regions.  There  is  no  combination  of  translation, 
rotation,  or  uniform  scaling  operations  which  can 
transform  a highly-oriented  texture  to  one  which 
has  no  orientation  preference.  An  important  aspect 
of  this  approach  to  texture  is  that  it  provides  a 
richer  description  than  merely  a point  in  a 
continuous  n-dimensional  feature  space.  Textures 
which  cannot  be  rotated,  translated,  or  scaled  in 
2-D  to  be  similar  can  be  classed  as  "absolutely 
different. " 

One  simple  discrimination  strategy  would  be 
to  separate  textures  into  classes  depending  on 
some  measure  of  average  eccentricity,  average 
region  size,  and  axial  orientation.  For  textures 
which  are  quite  different,  this  kind  of  a tech- 
nique is  fast  and  efficient.  For  similar  textures, 
however,  this  kind  of  strategy  is  prone  to  error. 

A much  more  powerful  technique  is  available  from 
local  property  data;  examining  the  statistically 
dependent  features,  and  characterizing  the  kinds 
of  dependencies. 

For  regular  patterns  where  particular 
elements  occur  frequently,  a particular  measure 
over  all  texture  primitives  in  an  image  may  not  be 
significant,  but  that  measure  may  be  quite  mean- 
ingful when  used  over  a subset  of  the  primitives. 
For  example,  a texture  which  contains  long  black 
rectangles  on  a noisy  background  may  have  low 
mean  eccentricity,  when  averaged  over  all  regions, 
but  extremely  high  eccentricity  in  a small,  low 
intensity  range.  Examining  distributions  of 
eccentricity,  size,  and  axial  orientation  over 
different  intensity  ranges  will  produce  a valuable 
description  component  when  feature  values  are  seen 
to  cluster  in  different  ranges.  A texture  whose 
large  regions  are  mainly  of  low  intensity  will  be 
different  than  one  whose  large  regions  are  con- 
centrated in  the  high  intensity  range  (see  Figure 
2).  In  the  images  used  here,  intensity  histograms 
have  been  normalized  so  that  the  number  of  pixels 
in  each  intensity  range  is  the  same.  It  is  inter- 
esting to  note  that  the  apparent  intensity  range 
still  differs,  and  appears  to  be  based  on  the 
intensity  of  primitive  elements  which  are  impor- 
tant to  the  texture. 

In  naturally  occurring  textures,  most  of 
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Figure  2:  Identical  Textures  (A  ft  B)  with  Intensity  Reversal 
Both  have  Identical  Intensity  histograms  but 
average  size  or  eccentricity  vs.  Intensity  range 
Indicates  differences. 


the  information  available  occurs  in  local 
measures.  There  is  no  particular  spatial  regularity 
of  texture  primitives  in  foliage,  or  gravel.  In 
regular  patterns,  like  a brick  wall,  the  spatial 
relationships  among  regions  will  be  necessary  for 
discrimination  among  different  regular  kinds  of 
bricklaying,  but  will  not  be  very  important  to 
separate  the  brick  from  the  bushes.  In  the  more 
general  problem  of  developing  a theoretical  basis 
for  textures,  it  is  important  to  deal  with 
textures  made  up  of  the  same  primitives  arranged 
differently.  When  local  properties  related  to 
scale  or  orientation  are  removed,  descriptions 
based  only  on  local  information  will  be  inadequate. 

4.  Spatial  Relationships  Among  Regions 

One  kind  of  relational  measure  is  suggested 
by  the  Julesz  spatial  dependency  matrix.  While 
such  a matrix  may  contain  all  of  the  important 
information  for  describing  spatial  relationships 
in  a texture,  it  also  contains  much  data  that  is 
not  germane.  Julesz 's  binary  patterns  have  two 
kinds  of  primitive  regions,  but  as  the  number  of 
classes  of  primitive  regions  grows,  the  amount  of 
data  which  is  strictly  an  artifact  of  the  under- 
lying structure  of  a texture  increases  rapidly. 

For  example,  it  is  easy  to  describe  a regular 
pattern  of  similar  dots  which  are  spaced  regular- 
ly in  both  the  horizontal  and  vertical  directions. 
All  that  is  necessary  is  the  dot  size  and 
intensity,  and  the  bi-directional  periodicity. 

If  a second  grid  of  dots,  with  different  intensity 
and  period,  is  added  to  the  first  grid,  a complex 
Moire  pattern  results.  However,  the  underlying 
structure  of  the  pattern  is  still  easy  to 
describe;  it  is  simply  the  superposition  of  the 
two  regular  grids.  Examining  the  complex  inter- 
actions between  the  different  kinds  of  dots 
produces  a great  deal  of  data,  but  no  additional 
information.  Spatial  dependency  matrices  would  not 
only  include  all  of  these  interactions,  but  also 
many  interactions  between  similar  dots  which  are 
not  related  to  the  bi-directional  regularity. 

What  is  needed  is  a characterization  of  only  that 
spatial  data  which  is  meaningful  to  a texture. 

The  key  to  relational  analysis  is  the  con- 
struction of  a useful  set  of  relational  features. 
The  problem  here  is  identical  to  all  statistical 
problems;  how  to  compress  a great  deal  of  data 
into  a small  amount  of  information.  An  enumeration 
of  all  spatial  relationships  extant  in  a texture 
is  not  useful:  two  areas  of  the  same  texture  will 
seldom  have  exactly  the  same  set  of  relationships, 
and  it  is  not  clear  how  to  measure  the  distance 
between  two  sets  of  relationships.  On  the  other 
hand,  with  statistical  measures  it  is  easy  to  dis- 
tort the  original  data  beyond  recognition. 

Usually,  the  kind  of  spatial  relationships 
which  allow  human  observers  to  discriminate  among 
different  textures  can  be  described  very  simply. 

It  is  easy  to  discriminate  between  two  dot 
patterns  of  equal  average  dot  density,  but  where 
one  texture  requires  that  every  pair  of  dots  be 
separated  by  some  minimum  distance.  The  difference 
between  these  two  kinds  of  dot  patterns  is  cer- 
tainly represented  in  the  spatial  dependency 
matrix,  but  use  of  the  full  matrix  is  data  over- 
kill. All  of  the  data  concerning  angular  separation 


between  pairs  of  dots  is  extraneous.  There  is  in 
fact  only  one  number  of  interest,  the  radius  of 
minimum  separation. 

We  propose  here  that  the  Interesting  rela- 
tionships are  those  between  primitive  elements  in 
the  same  class,  and  that  inter-class  relations 
are  usually  artifacts.  Several  experimental  oara- 
diqms  demonstrate  the  tendency  of  humans  to  take 
cognizance  of  relationships  among  similar  objects, 
and  disregard  the-  same  statistical  relationships 
among  different  objects.  For  instance,  when  a 
random  dot  pattern  is  turned  a few  degrees  around 
its  center,  and  printed  on  top  of  the  unrotated 
pattern,  circular  formations  are  immediately 
recognized.  If  the  second,  rotated,  pattern  is 
presented  in  a different  color,  the  circularity  is 
not  detected.  Here,  the  spatial  relationships 
among  dots  are  not  recognized  unless  the  dots  are 
all  members  of  the  same  class. 

The  kinds  of  relational  measures  that  we  use 
yield  data  similar  to  some  of  the  local  operators. 
Colinearity  is  an  important  feature,  and  includes 
orientation  data  that  is  more  global  than  the 
orientation  measure  provided  by  examining  orienta- 
tions of  individual  texture  primitives.  Here,  for 
two  eccentric  regions  to  be  col  inear,  their 
principal  axes  must  be  similar.  Three  non- 
eccentric regions  are  needed  before  any  measure 
of  colinearity  can  be  made.  When  eccentric  re- 
gions are  lined  up  along  their  minor  axes,  it 
seems  more  reasonable  to  describe  the  relationship 
as  parallel.  Other  kinds  of  useful  relationships 
which  seem  to  be  important  include  T-joints  and 
V-joints. 

The  apparent  nonlocal  nature  of  relational 
analysis  presents  several  problems.  Standard 
global  analysis  would  choose  some  window  of  an 
image,  and  look  for  a description  of  the  relation- 
ships which  exist  inside  of  that  window.  This  kind 
of  approach  is  computationally  costly  and  prone 
to  several  pitfalls.  Because  of  the  necessarily 
large  domain  over  which  relations  are  being  com- 
puted, parallel  processing  models  do  not  offer 
substantial  time  savings.  Because  some  arbitrary 
window  must  be  chosen,  windows  which  span  more 
than  one  textural  area  will  produce  meaningless 
results.  Choice  of  an  appropriate  window  size 
presents  a dilenma. 

Although  relational  measures  describe 
statistics  over  a group  of  texture  primitives, 
this  does  not  mean  that  the  benefits  of  local 
computation  must  be  lost.  Instead  of  looking  for 
relationships  within  a window,  we  look  far  rela- 
tionships for  each  primitive  within  a neighborhood 
whose  shape  is  determined  by  the  shape  of  the 
primitive.  That  is,  a particular  eccentric  region 
can  belong  to  at  most  one  col  Inear  set,  and  one 
parallel  set.  The  likelihood  of  a particular  rela- 
tionship can  be  measured  for  every  region  in  the 
neighborhood,  and  if  the  best  likelihood  is  higher 
than  some  threshold,  the  existence  of  the  rela- 
tionship is  posited.  By  choosing  a small  set  of 
relations  that  are  recognized,  the  relational  data 
for  a particular  primitive  element  becomes  simply 
another  feature  of  that  element.  The  same  kind  of 
techniques  used  to  describe  textures  (partially) 
by  using  shape-  features  of  texture  -lements  can 
be  extended  to  the  new  relational  features.  Arbi- 
trary, non-linearly  separable  textural  boundaries 
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can  still  be  found  by  clustering  texture  elenients 
into  compatible  sets. 

An  example  of  the  use  of  relational  measures 
is  discrimination  between  water  and  straw,  when 
the  appropriate  scaling  and  rotational  normaliza- 
tions have  been  executed.  The  resulting  textures 
are  quite  similar,  but  the  straw  has  many  regions 
which  are  colinear  along  the  major  axis  of  orien- 
tation. The  water  contains  regions  which  are 
parallel  along  an  axis  which  is  unrelated  to  the 
orientation  of  individual  regions,  but  dependent 
on  the  illumination  angle.  Figure  3 shows  these 
textures,  and  Figure  4 the  processing  steps. 

These  simple  relations  add  enough  power  to 
the  local  properties  already  described  to  differ- 
entiate among  most  natural  textures.  When  discri- 
miration  fails,  the  textures  which  have  been 
unsuccessfully  classified  are  practically  indis- 
tinguishable to  a human  observer,  and  relate  to 
irregularities  within  a single  texture.  If  the 
textural  class  cation  of  surrounding  areas  were 
used  as  an  additional  cue  in  hard  cases,  near 
perfect  classification  would  be  possible. 

5.  Utilizing  Texture  Measures:  Design  Decisions 

Successful  statistical  analysis  depends  on  a 
realistic  model  of  the  population  from  which  data 
has  been  gathered.  By  choosing  a reasonable, 
r region-based  model  of  texture,  it  is  possible  to 

provide  descriptions  which  are  useful  in  a wide 
range  of  image  segmentation  problems.  The  most 
important  decision  was  to  represent  textures  by 
analyzing  what  kind  of  clustering  occurs  among 
pixels,  rather  than  trying  to  collect  a set  of 
features  for  each  pixel. 

In  any  problem  domain  it  is  important  to 
remain  flexible  in  spite  of  permutations  in  the 
given  problem.  A very  different  set  of  images 
might  call  for  a slightly  different  set  of  fea- 
tures. In  fact,  given  any  strategy  for  discrimi- 
nating texture,  it  will  be  possible  to  construct 
textures  which  cannot  be  discriminated  using  that 
strategy,  but  which  will  be  discriminated  using 
another  approach.  For  example,  textures  made  up 
of  triangles  will  not  be  discriminable  using  the 
set  of  features  described  here  from  another  tex- 
ture in  which  the  triangles  have  been  replaced  by 
equivalent  area  squares.  It  is  easy  to  see  what 
feature  would  be  sufficient  for  discrimination  in 
this  case.  One  example  of  two  textures  not  dis- 
criminable by  humans  is  given  by  Julesz  in  his 
figure  contrasting  randomly  distributed  "R's"  with 
randomly  distributed  mirror-image  "R's."  The 
immediate  impression  is  that  the  two  textures  are 
identical,  but  an  analytical  examination  will 
quickly  find  the  boundaries  between  the  two 
textures. 

When  constructing  a set  of  textural  descrip- 
tors, the  nature  of  the  problem  domain  must  be 
considered.  More  information  will  only  produce 
better  results  in  any  reasonable  system,  and  the 
system  described  here  is  not  an  exception  to  that 
rule.  At  every  stage  of  computation  where  it  is 
possible  to  use  knowledge  already  gained  to  pre- 
dict the  most  effective  subsequent  computation, 
that  knowledge  should  be  used. 


6.  Applications  and  Related  Work 

Texture  can  be  an  important  cue  for  segmenta- 
tion in  many  problem  domains.  Terrain  classifica- 
tion is  often  only  possible  with  textural 
descriptors.  Natural  scenes  are  full  of  textured 
areas,  and  successful  segmentation  of  natural 
images  has  been  possible  only  by  limiting  the 
textural  content  of  input  scenes.  Using  texture 
description  procedures  will  improve  the  perform- 
ance of  segmentation  programs,  yielding  more 
accurate  region  assignments,  and  reducing  the 
computing  time  required  for  textured  areas. 

Texture  often  provides  additional  cues,  like 
surface  orientation  or  depth  information.  These 
cues  are  called  texture  gradients.  Texture 
gradient  information  can  be  extracted  from  tex- 
tural description,  using  a realistic  model  of  the 
transformations  caused  by  altering  orientation. 
Some  kinds  of  transformations  are  due  to  three- 
dimensional  phenomena,  and  require  different 
models  for  what  "texture  gradient"  means.  Black 
dots  painted  on  a white  sphere  will  undergo  a 
particular  kind  of  shape  transformation  as  they 
move  away  from  the  sphere  center.  Black  spikes 
protruding  from  a white  sphere  will  give  the 
appearance  of  the  same  texture  in  the  center  of 
the  sphere,  but  a very  different  textural  trans- 
formation indicates  distance  from  the  center.  In 
both  cases,  the  texture  gradient  will  be  reflected 
in  predictable  changes  of  both  local  and  relation- 
al statistics  of  texture  primitives.  One  direction 
for  further  research  is  to  set  up  appropriate 
models  for  such  gradient  transformations,  and 
produce  algorithms  which  detect  these  changes. 

Depth  cues  from  textural  transforms  present 
the  problem  of  textural  artifacts  produced  by  low- 
resolution  scanning.  As  a texture  recedes,  it  will 
eventually  reach  the  point  that  the  scanning 
resolution  is  insufficient  to  detect  important 
texture  primitives.  Intuitively,  a texture  should 
become  less  coarse  as  it  recedes,  since  the  pri- 
mitive elements  become  smaller.  But  the  sampling 
problems  often  cause  fine  textures  to  become 
coarser  as  they  recede.  Again,  it  is  possible 
to  predict  the  distance  at  which  such  anomalies 
occur,  and  use  neighboring  texture  information  to 
hypothesize  a continuous  depth  change  across  an 
apparent  textural  break. 

Texture  is  a high-resolution  phenomenon.  It 
would  be  useful  to  provide  input  sensors  with  the 
ability  to  monitor  an  area  of  textural  interest 
at  high  resolution,  and  produce  a textural  des- 
cription of  that  area.  This  would  make  textural 
cues  as  easy  to  use  as  multi -spectral  information, 
and  add  considerable  power  without  sacrificing 
the  benefits  of  lower-resolution  scanning  for 
general  segmentation.  While  low-resolution  images 
may  appear  to  contain  textural  areas,  the  range 
of  such  textures  is  necessarily  small,  and  often 
reflects  artifacts  of  the  scanning  resolution. 

The  appearance  of  varied  textures  in  low- 
resolution  images  is  largely  due  to  semantic 
prediction  of  what  texture  ought  to  be  present  in 
a recognized  image.  Recognition  of  low-resolution 
faces  is  one  example  of  the  brain's  excellent 
capacity  to  produce  effective  hypotheses  with 
little  input  information.  Trying  to  find  texture 
in  low-resolution  images  is  a little  like  trying 
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Figure  3:  A straw  texture  (A)  may  be  normalized  (B) 
to  appear  similar  to  a water  texture  (C).  Normalized 
straw  and  water  may  be  discriminated  by  region  colinearity. 
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to  predict  color  from  a black  and  white  image. 
Although  a person  might  predict  many  colors  quite 
successfully,  there  is  no  reason  to  believe  that 
such  color  information  is  contained  in  the  raw 
input  data. 

7.  Summary 

Texture  analysis  is  a tool  to  detect  the 
forest  from  the  trees.  Understanding  texture  does 
not  solve  the  image  understanding  problem,  nor 
does  it  produce  the  ultimate  region  grower. 

Texture  descriptions  provide  a useful  input  to 
higher- level  understanding  programs,  and  a valu- 
able cue  for  many  segmentation  problems.  We  have 
presented  here  a technique  for  producing  useful 
descriptions  of  texture  based  on  local  shape  sta- 
tistics of  simple  regions  called  texture  primi- 
tives, and  on  a simple  set  of  relationships  among 
similar  classes  of  texture  primitives.  The  useful- 
ness of  this  description  has  been  demonstrated 
through  standard  discrimination/classification 
tests.  More  importantly,  these  descriptions  pro- 
vide quantification  of  discrimination  criteria 
and  allow  recognition  of  similar  textures  over 
transformations  of  scale,  intensity,  and  rotation. 
Some  different  textures  can  be  normalized  to 
produce  descriptions  which  are  as  similar  as 
possible,  and  these  normalized  textures  do  in 
fact  appear  quite  similar.  Proposed  extensions 
include  using  a world  model  to  detect  spatial 
orientation  and  distance  transformations. 
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ABSTRACT 

This  note  summarizes  two  recent 
studies  on  "relaxation"  techniques  for 
image  segmentation: 

1)  The  use  of  hierarchical  discrete 
relaxation  for  waveform  parsing 

2)  The  estimation  of  coefficients 
for  probabilistic  relaxation  pro- 
cesses by  statistical  analysis  of 
input  images. 
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INTRODUCTION 

"Relaxation"  is  an  iterative  approach 
to  classifying  a set  of  interrelated 
objects  (e.g.,  parts  of  an  image).  In 
"discrete  relaxation" , a set  of  possible 
class  names  is  initially  associated  with 
each  object.  At  subsequent  iterations, 
class  names  are  discarded  from  an  object 
if  they  are  inconsistent  with  the  surviving 
class  name  possibilities  for  other,  re- 
lated objects;  this  is  done  "in  parallel", 
for  all  objects  simultaneously.  The  pro- 
cess is  repeated  until  no  further  changes 
can  take  place;  it  often  yields  highly  un- 
ambiguous classifications.  Waltz  [1] 
applied  this  approach  to  labeling  the 
parts  of  a line  drawing;  a general  discus- 
sion of  such  methods  can  be  found  in  (2] . 

In  "probabilistic  relaxation",  a set 
of  estimates  of  class  assignment  probabi- 
lities is  initially  associated  with  each 
object.  At  subsequent  iterations,  the 
probabilities  are  adjusted  in  accordance 
with  the  support  that  they  receive  from 
the  class  probabilities  of  related  objects. 
This  process,  when  repeated,  often  leads 
to  a marked  reduction  in  the  ambiguity  of 
the  classifications.  A general  discussion 
of  such  methods  can  be  found  in  [2,  3], 
and  a collection  of  examples,  involving 
the  labeling  of  image  points  for  segmenta- 
tion or  noise  cleaning  purposes,  is  re- 
viewed in  [4]  . 


This  note  summarizes  two  recent 
studies  on  relaxation  methods : 

1)  The  use  of  hierarchical  discrete 
relaxation  for  waveform  parsing 

2)  The  estimation  of  coefficients 
for  probabilistic  relaxation  pro- 
cesses by  statistical  analysis 

of  input  images 

HIERARCHICAL  RELAXATION  FOR  WAVEFORM 
PARSING 

Images  and  waveforms  can  often  be 
described  hierarchically  as  consisting  of 
parts  that  are  in  turn  composed  of  sub- 
parts, and  so  on,  down  to  a level  of 
"primitive"  parts;  where  at  each  level, 
the  parts  are  in  approximately  specified 
positions  relative  to  one  another.  Such 
a hierarchical  structure  is  essentially  a 
layered  system  of  "spring-loaded"  tem- 
plates [5).  The  process  of  recognizing 
that  a description  of  this  type  applies 
to  a given  image  or  waveform  is  essen- 
tially a process  of  parsing  with  respect 
to  a stratified  context-free  grammar,  with 
the  primitive  parts  as  terminal  symbols. 

In  [61,  a parallel,  iterative  method 
(called  a "relaxation"  method)  of  detect- 
ing spring-loaded  template  matches  was 
proposed.  In  this  method,  matches  to  the 
subtemplates  are  detected,  and  for  each 
such  match,  supporting  evidence  is  sought 
— i.e.,  do  other  matches  occur  in  the 
expected  relative  positions.  Subtemplate 
matches  for  which  sufficient  evidence  is 
lacking  are  discarded,  and  the  process  is 
iterated  (since  discarding  one  subtemplate 
may  weaken  the  evidence  for  another  one) . 
This  process  can  be  carried  out  in 
parallel  for  all  the  subtemplate  matches, 
so  as  to  rapidly  eliminate  all  but  those 
that  belong  to  matches  of  the  entire 
spring-loaded  template. 

This  relaxation  approach  can  be  gen- 
eralized to  hierarchical  spring-loaded 
templates.  Here  the  interactions  among 
the  parts  are  more  complicated;  when  a 
part  is  discarded  at  one  level,  this  can 
cause  other  parts  to  be  discarded  at 
other  levels,  and  further  iteration  may 
be  needed  at  all  of  these  levels.  The 
procedure  involves  the  following  steps  [7]: 


1)  Primitive  detection,  to  create 
the  lowest  layer  of  the  hierar- 
chical network;  or,  more  general- 
ly, given  the  nth  layer,  creation 
of  the  (n+l)st  layer  by  detecting 
matches  to  the  appropriate  set  of 
templates . 

2)  Elimination  of  nodes  from  any 
layer  if  they  do  not  contribute 
to  nodes  in  the  following  layer. 

3)  Elimination  of  nodes  within  any 
layer  if  they  do  not  occur  in  the 
proper  context,  as  defined  by  the 
templates  that  are  applicable  to 
that  layer. 

This  procedure  was  applied  to  a noisy 
waveform  defined  by  a four-layer  template 
hierarchy.  The  primitive  detection  pro- 
cess found  25  possible  primitives  in  this 
waveform,  but  at  successive  stages  of  the 
procedure,  most  of  these  were  eliminated, 
until  only  those  corresponding  to  the 
ideal  waveform  remained,  and  the  correct 
"parse"  of  the  waveform  (in  terms  of  the 
template  hierarchy)  was  obtained.  Exten- 
sions and  further  applications  of  this 
approach  are  planned,  as  discussed  in  [7] . 

COEFFICIENT  SELECTION  FOR  PROBABILISTIC 
RELAXATION 

In  a probabilistic  relaxation  pro- 
cess, the  class  probabilities  for  each 
object  are  adjusted  in  accordance  with  the 
support  that  they  receive  from  the  class 
probabilities  of  other  objects.  This 
support  is  usually  defined  in  terms  of  a 
set  of  coefficients,  one  for  each  pair  of 
(object,  class)  pairs.  A positive  co- 
efficient indicates  that  these  pairs 
mutually  reinforce,  a negative  coefficient 
indicates  that  they  conflict,  while  a 
near-zero  coefficient  indicates  a "don't- 
care"  situation. 

In  earlier  work  [4]  , it  is  assum'Sd 
that  these  coefficients  can  be  defined 
from  an  understanding  of  the  given  classi- 
fication problem.  For  example,  consider 
the  problem  of  detecting  smooth  curves  in 
an  image.  Here  each  image  poxnt  can  be- 
long to  a set  of  classes  corresponding  to 
the  existence  of  curves  through  the  point 
at  various  orientations,  or  to  no  curve  at 
the  point.  The  curve  probabilities  in 
given  orientations  at  two  points  should 
support  one  another  if  curves  in  those 
orientations  through  the  two  points  would 
smoothly  continue  one  another.  Co- 
efficients representing  this  support  can 
be  defined  in  various  ways.  For  some 
choices  of  the  coefficients,  the  process 
is  highly  successful  in  enhancing  smooth 
curves  while  suppressing  noise,  while  for 
other  choices  it  is  less  successful. 

In  [8] , a method  of  automatically  de- 
fining coefficients  for  probabilistic  re- 
laxation processes  is  introduced,  based  on 


statistical  analysis  of  the  initial  class 
probabilities.  In  particular,  given  a 
pair  of  (object,  class)  pairs  (OyC-^  and 

(02,02),  let  Ej^,  E2  be  the  events  that 
is  in  Cj  and  O2  in  C2,  respectively.  The 

mutual  information  of  this  pair  of  events 
has  just  the  properties  that  are  desir- 
able in  relaxation  coefficients:  it  is 
high  if  Ej^  and  E.  tend  to  co-occur,  and 
low  if  they  do  not.  We  can  estimate  the 
mutual  information  for  each  such  pair  of 
events,  by  statistically  analyzing  a 
suitable  ensemble  of  input  data,  and  use 
the  estimated  values  as  relaxation  co- 
efficients . 

This  approach  was  applied  in  [8]  to 
the  curve  enhancement  problem.  For  each 
point  P of  an  input  image,  initial  prob- 
abilities of  there  being  curves  in 
various  orientations  through  P,  or  no 
curve  at  P,  were  derived  by  normalizing 
the  outputs  of  line  detection  operators 
applied  at  P.  Mutual  information  was 
then  computed  for  all  possible  pairs  of 
neighboring  points  and  curve  orientations 
(or  no  curve) , averaged  over  all  points 
of  the  image.  When  the  resulting  mutual 
information  values  were  used  as  co- 
efficients in  a curve  enhancement  re- 
laxation process,  good  results  were 
obtained.  The  results  remained  good  when 
coefficients  derived  from  one  image  were 
applied  to  enhance  curves  in  an  image  of 
an  entirely  different  type.  These  ex- 
periments suggest  that  it  should  be 
possible,  in  many  cases,  to  derive  re- 
laxation coefficients  automatically  by 
statistically  analyzing  a suitable  en- 
semble of  input  data. 
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Abstract 

Several  techniques  for  use  in  a stereo  vision  system  are 
described  These  include  a stereo  camera  model  solver,  a high 
resolution  stereo  correlator  for  producing  accurate  matches  with 
accuracy  and  confidence  estimates,  a search  technique  for  using 
the  correlator  to  produce  a dense  sampling  of  matched  points 
for  a pair  of  pictures,  and  a ground  surface  finder  for 
distinguishing  the  ground  from  objects,  in  the  resulting 
three-dimensional  data.  Possible  ways  of  using  these  techniques 
in  an  autonomous  vehicle  designed  to  explore  us  environment 
are  discussed.  Examples  are  given  showing  the  detection  of 
objects  from  a stereo  patr  of  pictures,  including  some  examples 
using  aerial  phcKographs. 


Introduction 

This  paper  describes  a stereo  vision  system  for  use  by  a 
compute  -controlled  vehicle  which  can  move  through  a cluttered 
environment,  avoid  obstacles,  navigate  to  desired  locations,  and 
build  a description  of  its  environment.  One  possible  application 
of  s«'c  a vehicle  is  in  planetary  exploration.  Our  experimental 
vehicie  is  described  in  [11 

As  the  vehicle  moves  about,  it  takes  stereo  picture  pairs 
from  various  locations.  This  could  be  done  with  two  cameras 
mounted  on  the  vehicle,  but  with  our  present  vehicle  with  one 
camera,  it  is  done  with  the  vehicle  at  two  locations  Each  of 
these  stereo  pairs  is  processed  to  extract  the  needed 
three-dimensional  information,  and  then  this  information  from 
different  pairs  can  be  combined  in  further  proceuing. 

The  processing  of  the  stereo  pairs  is  done  as  follows. 
First,  an  interest  operator  finds  small  features  with  high 
information  content  in  the  first  picture.  Then,  a binary  search 
correlator  finds  the  corresponding  poinu  in  the  other  picture. 
(The  interest  operator  and  the  binary  search  correlator  were 
both  developed  by  Moravec  [41)  Next,  a high-resolution 
correlator  is  given  these  matched  pairs  of  points.  It  tries  to 
improve  the  aauracy  of  the  match,  and  it  produces  an  accuracy 
estimate  in  the  form  of  a two-by-two  covariance  matrix,  and  a 
probability  estimate  giving  the  gocxlneu  of  the  match.  The 
coordinates  of  these  matched  points  are  corrected  for  camera 
distortion  as  described  by  Moravec  [11  A stereo  camera  model 
solver  then  uses  these  matched  pairs  of  points  to  find  the  five 
angles  that  relate  the  position  and  orientation  of  the  two  camera 
locations.  The  accuracy  estimates  are  used  by  the  camera  model 
solver  to  weight  the  individual  points  in  the  solution  and  to 
compute  accuracy  estimates  of  the  resulting  camera  model.  A 
dense  sampling  of  points  is  now  matched  over  the  pictures.  The 
known  camera  model  is  used  to  restrict  the  search  for  these 


matches  to  one  dimension,  and  by  first  trying  matches 
approximately  the  same  as  neighboring  points  that  have  already 
been  matched,  often  no  search  is  needed.  In  any  case,  the 
precise  matches  are  produced  by  the  high-resolution  co.'’relator, 
and  its  probability  estimates  are  used  in  guiding  the  search. 
After  these  match^  points  are  corrected  for  camera  distortion, 
distances  to  the  corresponding  points  in  three-dimensional  space 
are  computed,  using  the  known  camera  model.  The  accuracy 
estimates  of  the  matches  and  of  the  camera  model  are 
propagated  into  accuracy  estimates  of  the  computed  distances. 
The  three-dimensional  information  for  all  of  the  matched  points 
IS  now  transformed  into  a coordinate  system  approximately 
aligned  with  the  horizontal  surface.  (The  high-resolution 
correlatur,  the  stereo  camera  model  solver,  and  the  technique  for 
producing  the  dense  sampling  of  matches  are  described  later  in 
this  paper.) 

Information  from  more  than  one  stereo  pair  can  be 
combined  to  produce  a more  complete  mapping  of  points  over 
the  area.  A ground  surface  finder  is  then  used  to  find  the 
ground  for  portions  of  the  scene,  which  may  be  tilted  slightly 
relative  to  the  assumed  horizontal  coordinate  system.  (The 
ground  surface  finder  is  described  later  in  this  paper.)  Points 
which  lie  sufficiently  above  the  ground  surface  can  be  assumed 
to  lie  on  objects.  (In  the  process  of  finding  the  ground  surface 
and  finding  objects,  the  accuracy  and  probability  estimates  are 
useful.) 


Stereo  Camera  Model  Solver 


If  the  image  plane  coordinates  of  several  pairs  of 
corresponding  points  in  a stereo  pair  of  images  have  been  i 

measured,  it  is  possible  in  general  to  use  this  information  to 


compute  the  relative  position  and  orientation  of  the  two 
cameras,  except  for  a distance  scale  factor.  Once  this  calibration 
has  been  performed,  the  distance  to  the  object  point  represented 
by  each  pair  of  image  points  can  be  computed. 

A procedure  that  performs  the  above  stereo  camera  model 
calibration  by  means  of  a least-squares  adjustment  has  been 
written.  It  includes  automatic  editing  to  remove  wild  points,  the 
use  of  a two-by-two  covariance  matrix  for  each  point  for 
weighting  purposes,  estimation  of  an  additional  component  of 
variance  by  examination  of  the  residuals,  and  propagation  of 
error  estimates  into  the  results. 

Consider  any  point  in  the  three-dimensional  scene.  Let 
the  coordinates  of  the  image  of  this  point  in  the  Camera  I film 
plane  be  X|,y|  and  the  coordinates  of  its  image  in  the  Camera  2 
film  plane  be  Xj.yj.  Image  point  X|,yj  corresponds  to  a ray  in 
space,  which,  when  projected  into  the  Camera  2 film  plane, 
becomes  a line  segment.  The  distance  (in  the  Camera  2 film 
plane)  from  image  point  X},yj  to  the  nearest  point  in  this  line 
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s ^fement  is  the  magnitude  of  the  error  in  the  matching  of  this 
point.  This  error  is  a function  of  the  angles  which  define  the 
relative  position  and  orientation  of  the  two  cameras.  (These 
angles  are  the  azimuth  and  elevation  of  the  position  of  Camera 
2 relative  to  the  position  of  Camera  I,  and  the  pan,  tilt,  and  roll 
of  Camera  2 relative  to  the  orientation  of  Camera  1.)  The 
camera  calibration  is  done  by  adjusting  these  angles  to  minimize 
the  weighted  sum  of  the  squares  of  these  errors  for  all  of  the 
points  that  are  used.  Since  the  problem  is  nonlinear,  the 
procedure  uses  partial  derivatives  to  approximate  the  problem 
by  the  general  linear  hypothesis  model  of  statistics,  and  iterates 
to  achieve  the  exact  solution. 

The  automatic  editing  is  done  as  follows.  First,  a 
weighted  least-squares  solution  as  described  above  is  done  using 
all  of  the  points.  Then  the  point  which  has  the  largest  ratio  of 
lesidual  to  standard  deviation  of  the  residual  is  found.  This 
point  is  tentatively  rejected,  and  the  solution  is  recomputed 
without  this  point.  If  this  point  now  disagrees  with  the  new 
solution  by  more  than  three  standard  deviations,  it  is 
permanently  rejected,  and  the  entire  process  repeats.  Otherwise, 
the  point  is  reinstated,  and  the  process  terminates.  However,  if 
an  F test  comparing  the  computed  and  given  values  of  the 
additional  variance  of  observations  shows  the  solution  that 
includes  the  point  to  be  bad,  the  point  in  question  is  rejected  in 
any  event. 

A more  complete  description  of  the  camera  model  solver 
can  be  found  in  [1]. 


brightness  values.  (The  asuumptions  concerning  errors  hold 
fairly  accurately  for  the  usual  noise  content  of  pictures.  The 
assumption  concerning  the  true  brightness  values  will  be  relaxed 
slightly  below  to  allow  bightness  bias  and  contrast  changes. 
However,  another  type  of  change  is  perspectve  distortion,  which 
can  be  important  with  large  match  windows,  but  it  will  not  be 
discussed  here.) 

We  temporarily  assume  that  the  variance  of  the  errors  is 
known  for  every  point  in  each  picture. 

We  now  wish  to  find  the  matching  point  x ,y^  which 
will  produce  the  best  match  of  A2(x*x,^-X|,y*y^-y])  to  Aj(x,y) 
in  some  sense.  Traditionally  the  match  which  maximized  the 
correlation  coefficient  between  Aj  and  has  been  used  [2]. 
Indeed,  this  is  a reasonable  thing  to  do  if  one  of  two  functions 
has  no  noise.  However,  here  both  functions  have  noise.  This 
fact  introduces  fluctuations  in  the  cross-correlation  function 
which  may  cause  its  peak  to  differ  from  the  expected  value.  Ad 
hoc  smoothing  techniques  could  be  used  to  reduce  this  effect,  but 
an  optimum  solution  can  be  derived  from  the  assumed  statistics 
of  the  noise. 

Let  6 represent  the  w*  - vector  of  the  differences 
• A,(x,y)  over  the  w^  by  w^  match 
window,  for  a given  trial  value  of  and  let  x^,y^  represent 

the  true  (unknown)  value  of  X|^,yjjj.  Let  P represent  a 
girobability  and  p represent  a probability  density  with  respect  to 
the  vector  C.  Then  by  Bayes’  theorem 


High-Resolution  Correlator 

Consider  the  following  problem.  A pair  of  stereo  pictures 
IS  available.  For  a given  point  in  Picture  I,  it  is  desired  to  find 
the  corresponding  point  in  Picture  2.  It  will  be  assumed  here 
that  a higher-level  process  has  found  a tentative  approximate 
matching  point  in  Picture  2,  and  that  there  is  an  area 
surrounding  this  point,  called  the  search  window,  in  which  the 
correct  matching  point  can  be  assumed  to  lie.  A certain  area 
surrounding  the  given  point  in  Picture  1,  called  the  match 
window,  will  be  used  to  match  against  corresponding  areas  in 
Picture  2,  with  their  centers  displaced  by  various  amounts 
within  the  search  window  in  order  to  obtain  the  best  match. 

Thus  when  the  matching  process  (correlator)  is  given  a 
point  in  one  picture  of  a stereo  pair  and  an  approximate 
matching  point  in  the  other  picture,  it  produces  an  improved 
estimate  of  the  matching  point,  suppressing  the  noise  as  much  as 
possible  based  on  the  statistics  of  the  noise.  It  also  produces  an 
estimate  of  the  accuracy  of  the  match  in  the  form  of  the 
variances  and  covariance  of  the  x and  y coordinates  of  the 
matching  point  in  the  second  picture,  and  an  estimate  of  the 
probability  that  the  match  is  consistent  with  the  statistics  of  the 
noise  in  the  pictures,  rather  than  being  an  erroneous  match. 
This  probability  will  be  useful  in  guiding  a higher-level  search 
needed  to  produce  a dense  sampling  of  matched  points. 

Let  A](x,y)  represent  the  brightness  values  in  Picture  1, 
A2(x,y)  represent  the  brightness  values  in  Picture  2.  Xj.yj 
represent  the  point  in  Picture  I that  we  desire  to  match,  X2.y2 
represent  the  center  of  the  search  window  in  Picture  2,  w^^^ 
represent  the  width  of  the  match  window  (assumed  to  be 
square),  and  w^  represent  the  width  of  the  search  window 
(assumed  to  be  square),  where  x and  y take  on  only  integer 
values. 

The  following  assumptions  are  made.  Aj  and  A2  consist 
of  the  same  true  brightness  values  displaced  by  an  unknown 
amount  in  x and  y,  with  normally  distributed  random  errors 
added.  The  errors  are  uncorrelaied  with  each  other,  both 
within  a picture  (autocorrelation)  and  between  pictures  (cross 
correlation),  and  the  errors  are  uncorrelated  with  the  true 
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If  we  assume  that  the  a priori  probability  P(Xn,,ym-X£.yj)  is 
constant  over  the  search  window  and  is  zero  elsewhere,  this 
reduces  to 
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where  k is  any  constant  of  proportionality.  Since  t consists  of 
uncorrelated  normally  distributed  random  variables. 
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and  where  Cj  denotes  the  components  of  (,  0*2  and  IT2  are  the 
standard  deviations  of  Aj  and  A2,  and  the  product  and' sum  are 
taken  over  the  match  window.  (Very  often,  the  the  variances 
a 1"  and  can  be  considered  to  be  constant.  In  this  case,  the 
summation  can  be  reduced  to  the  sum  of  the  squares  of  the 
differences  over  the  march  window,  with  the  sum  of  the  two 
variances  factored  out.)  Thus, 


P<x„.y„*x,.yjt)  - kw 


So  far,  the  derivation  is  quite  usual.  If  we  simply  wanted 
to  maximize  P (for  the  maximum  likelihood  solution),  we  would 
minimize  the  above  sum  (that  is,  use  a weighted  least-squares 
solution).  However,  because  of  the  fluctations  in  w caus^  by 
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the  presence  of  noise  in  both  images,  the  peak  of  P in  general 
differs  from  the  center  of  the  distribution  of  P in  a random  wajr 
due  to  the  random  nature  of  the  errors. 

Therefore,  we  define  the  optimum  estimate  of  the 
matching  position  to  be  the  mathematical  expectation  of  x^,)r^ 
according  to  the  above  probability  distribution.  Thus,  letting 
(xo.yo)  represent  this  optimum  estimate,  we  have 


where  the  sums  are  taken  over  the  search  window.  The 
variances  and  covariance  of  Xg  and  yg  are  given  by  the  second 
moments  of  the  distribution  around  the  expected  values: 
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The  covariance  matrix  of  xg  and  yg  consists  of  and  on 
the  main  diagonal  and  on  both  sides  off  the  diagonal. 

It  might  appear  that  the  above  analysis  is  not  correct 
because  of  the  fact  that  certain  combinations  of  errors  at  each 
point  of  each  picture  are  possible  for  more  than  one  match 
position,  and  the  probability  of  these  combinations  is  split  up 
among  these  match  positions.  However,  this  fact  does  not 
influence  the  results,  as  can  be  seen  from  the  following 
reasoning.  The  possible  errors  at  each  point  of  each  picture 
form  a multidimensional  space.  When  a particular  match 
position  is  chosen,  a lower.dimensioned  subspace  of  this  space  is 
selected,  in  order  to  be  consistent  with  the  measured  brightness 
values.  When  another  match  is  chosen,  a different  subspace  is 
selected.  These  two  subspaces  in  general  intersect,  if  at  all,  in  a 
subspace  of  an  even  lower  number  of  dimensions.  Thus  the 
hypervolume  (in  the  higher  subspace)  of  this  lower  subspace  is 
zero.  Therefore,  the  fact  that  the  two  subspaces  intersect  does 
not  change  the  computed  probabilities. 

Now  suppose  that  the  standard  deviations  (T^  and  (T2 
not  known.  It  is  possible  to  estimate  them  (actually,  the  sum  of 
their  squares,  which  is  what  is  needed  in  the  equation  for  w) 
from  the  data  if  it  is  assumed  that  they  are  constant,  that  is,  the 
noise  does  not  vary  across  the  piaures.  Let  v equal  the  constant 
value  of  tTj*  ♦ Then  (the  mean  square  value  of  the 

components  of  €)  is  an  estimate  for  v,  where  • denotes  the  vector 
dot  product.  However,  this  value  is  different  for  each  possible 
match  p'''ition  x^,y^.  The  method  used  to  obtain  the  best 
value  fu,  V IS  to  average  all  of  these  values  for  v,  weighted  by 
the  probability  for  each  match  position  P<k^,y„-x,,y,  I 0 • w. 
Thus  a preliminary  variance  estimate  is  computed  by 
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where  the  sums  are  taken  over  the  search  window.  However, 
this  averaging  proceu  introduces  a biu  because  of  the  statistical 
tendency  for  the  smaller  values  to  have  the  greater  weights.  It 
can  be  shown  that  this  effect  causes  the  estimate  of  variance  to 
be  too  small  by  a ratio  that  can  be  anywhere  from  .5  to  I. 


Therefore,  an  enipirically  determined  approximate  correction 
factor  is  applied  to  the  variance  estimate  u follows: 


1 - 0.6  (1  - ^ )®  * 

where  u is  the  minimum  value  of  over  the  search 

window.  Since  the  computation  of  w requires  the  value  of 
ar^tO'^  (~v),  the  above  process  is  iterative. 

An  estimate  of  an  upper  limit  to  the  variance  is  also 
computed  from  the  high-frequency  content  of  the  piaures.  First, 

[A(x-l,y)  « A(x«l,y)  ♦ A(x,y-I)  ♦ A(x,y«l)  ■ i A(x,y)]^ 


Then  U is  averaged  over  an  appropriate  local  window  and  the 
results  for  the  two  pictures  are  added  together  to  form  the 
estimate  of  the  upper  limit  of  v. 

The  overall  variance  estimate  used  in  the  above  equations 
Is  obtained  by  an  appropriate  weighted  combination  of  the  a 
priori  given  value,  the  derived  value,  and  the  computed  upper 
limit. 

The  probability  of  a correa  match  is  computed  by 
comparing  the  derived  variance  to  the  a priori  variance  and  the 
upper  limit  (high-frequency  variance)  by  means  of  F-tesis. 

Because  of  the  finite  window  size,  the  computed 
covariance  matrix  will  be  an  under-estimate.  An  approximate 
correction  for  this  effect  is  made  by  computing  the  eigenvalues 
and  eigenvectors  of  the  covariance  matrix,  applying  a correction 
to  the  eigenvalues,  and  then  reconstruaing  the  covariance 
matrix  from  the  eigenvalues  and  eigenvectors. 

The  above  computations  assume  that  the  shift  between 
the  two  pictures  is  always  an  integer  number  of  pixels.  In  cases 
where  the  correlation  peak  is  broad,  the  smoothing  process 
inherent  in  the  moment  computation  for  Xg.  yg.  Ir,^  and 
cause  a reasonable  interpolation  to  be  performed  if  the 
correct  answer  lies  between  pixels.  However,  when  the 
correlation  peak  is  sharp,  this  will  not  happen,  and  the  answer 
will  tend  towards  the  nearest  pixel  to  the  correct  best  match. 
This  is  not  particularly  serious  insofar  as  it  affects  the  position 
estimate,  but  it  can  have  a serious  effect  on  the  probability 
estimate.  This  is  because  the  E vector  should  be  much  smaller  at 
the  correctly  interpolated  point  than  it  is  at  the  nearest  pixel, 
because  of  the  sharp  peak.  Therefore,  the  probability  may  come 
out  much  too  small,  indicating  a bad  match,  whereas  the  match 
IS  really  good  but  lies  between  pixels.  To  overcome  this 
deficiency,  linear  interpolation  adjustments  are  made  to  the 
variance  and  probability,  and  the  covariance  matrix  is 
augmented  to  allow  for  interpolation  error. 

Since  there  may  be  changes  in  brightness  and  contrast 
between  the  two  pictures  of  the  stereo  pair,  the  correlator  can 
adjust  a bias  and  scale  factor  relating  the  brightness  values  in 
the  two  pictures.  This  requires  m^ifying  the  mathematics 
given  above.  Instead  of  actually  using  the  sum  of  squares  of 
differences  E E*,  in  the  above  equations,  the  moment  about  the 
principle  axis  of  the  function  relating  the  two  sets  of  brightness 
values  Is  used.  However,  the  sum  of  the  squares  of  the 
differences  is  still  the  main  ingredient  in  this  computation. 
Included  in  this  computation  are  a priori  weighu  on  the  given 
values  of  brightness  biu  and  Kale  faaor  (contrut).  Thus  the 
bias  and  Kale  factor  can  be  constrained  according  to  the  amount 
of  knowledge  about  them  from  other  sources,  if  any. 

As  stated  above,  when  the  variance  is  assumed  to  be 
constant,  a major  portion  of  the  computation  is  the  sum  of 
squares  of  differences  E This  are  computed  by  a very 


efficiently  coded  method  developed  by  Mortvec  lu  inner 
loop  (each  term  of  the  tummation)  requires  about  one 
microsecond  on  the  PDF  KLIO. 

Searching  for  Stereo  Matches 

Once  the  stereo  camera  nradel  is  known,  the  search  for 
matching  points  In  the  two  pictures  Is  greatly  constrained.  K 
point  in  Picture  I corresponds  to  a ray  in  space,  which,  when 
projected  into  Picture  2,  becomes  a line  segment  terminating  at 
the  point  corresponding  to  an  infinite  distance  along  the  ray. 
Furthermore,  by  first  trying  a match  with  approximately  the 
same  stereo  disparity  as  neighboring  points  that  already  have 
been  matched,  the  search  can  be  eliminated  for  many  points. 
One  criterion  for  deciding  when  to  accept  this  tentative  match  is 
the  probability  value  returned  by  the  high-resolution  correlator. 
Also,  when  a search  is  made,  the  likeliest  correct  match  is 
indicated  by  the  highest  probability  value. 

The  method  used  here  is  similar  in  some  ways  to 
matching  techniques  used  by  others  (for  example,  Quam  [5]  and 
Hannah  [2]).  However,  there  is  no  region  growing  in  the  sense 
of  Hannah,  since  the  equivalent  operations  are  left  until  later  in 
the  processing.  Instead,  the  stereo  disparities  are  allowed  to 
vary  in  an  arbitrary  way  over  the  picture,  subject  to  some  local 
constraints  discuss^  later.  Furthermore,  the  acceptance  of 
matches  is  guided  by  the  probability  values.  Also,  even  in  areas 
of  low  information  content,  the  noise  suppression  ability  of  the 
high-resolution  correlator  often  allows  useful  resulu  to  be 
obtained.  If  the  content  is  too  low,  the  correlator  indicates  this 
fact  by  producing  very  large  values  for  the  standard  deviations 
of  the  two  position  coordinates.  When  this  happens,  the 
searching  can  be  inhibited  to  save  computer  time,  but  even  if 
this  is  not  done,  the  results  are  uill  as  good  as  the  standard 
deviations  indicate.  (Actually,  the  correct  test  to  indicate  no 
useful  information  is  to  see  if  both  eigenvalues  of  the 
covariance  matrix  are  large.  Both  standard  deviations  might  be 
large,  but  if  only  one  eigenvalue  is  large,  an  accurate  distance 
can  still  be  computed  for  this  point  unless  the  corresponding 
eigenvector  is  almost  parallel  to  the  projected  line  segment.) 

The  method  currently  used  is  approximately  u follows; 

1.  Divide  Picture  I into  square  windows,  denoted  here  u 
’areas',  the  center  of  each  of  which  is  considered  to  be  a point 
to  be  matched  to  the  center  of  a similar  area  in  Piaurc  2 in  the 
following  steps.  (These  areu  normally  would  be  eoual  in  use  to 
the  match  window  of  the  high-resolution  correlator.) 

2.  Select  a set  of  starting  areas.  (Currently  a column  near  the 
edge  of  the  picture  It  used,  but  this  will  soon  be  changed  to  the 
points  which  were  produced  by  the  interest  operator  and 
binary-search  correlator  and  were  not  rejected  by  the  camera 
model  solver.) 

3.  Try  areas  adjacent  (including  diagonally  adjacent)  to  areas 
already  tried,  where  possible  ivork'ng  in  the  direaton  of  the 
projected  line  segmenu  In  Picture  2 t0Wk.*ds  the  Infinity  points. 

4.  If  there  are  at  least  tsvo  already  matched  areas  adjacent  to 
the  area  In  question  and  the  disparities  of  all  adjacent  matched 
areas  agree  within  a tolerance,  apply  the  high-resolution 
correlator  with  the  search  window  centered  on  the  position 
corresponding  to  the  average  disparity  of  these  neighbors. 
Otherwise,  go  to  6. 

5.  If  the  probability  returned  by  the  correlator  in  step  4 is 
greater  chan  0.1,  accept  this  match  and  go  to  t. 

6.  Starting  at  the  infinity  point,  search  along  the  projected  line 
segment  in  Picture  2,  applying  the  search  window  of  the 
high-resolution  correlator  at  poinu  with  a spacing  ot  half  of  the 
search  window  width,  but  not  at  previously  mauhed  areas. 

7.  Of  those  matches  found  in  step  6,  seleo  the  one  for  which 


the  correlator  returned  the  highest  probability.  If  this 
probability  is  greater  than  0.1  and  at  least  one  neighboring  area 
(including  these  tentative  matches)  agrees  in  disparity  and  has  a 
probability  greater  than  0 01,  or  vice  versa,  accept  this  match. 
Otherwise,  of  those  matches  found  in  step  6 with  probability 
greater  then  0.1,  if  any,  accept  the  one  whose  disparity  agrees 
most  closely  with  its  neighbors,  if  within  the  tolerance. 

8.  When  the  current  group  of  areas  being  tried  is  exhausted,  go 
to  3.  If  there  are  no  areas  left,  finish. 

Some  improvements  can  be  made  to  this  algorithm  in  the 
future.  For  example,  another  pass  can  be  made  over  the  data  to 
clean  things  up,  utiliiing  the  fact  that  most  areas  have  more 
matched  neighbors  than  they  did  when  things  were  progressing 
in  a basically  one-directional  manner.  Another  possibility  is  to 
change  step  7 in  the  following  way.  The  best  match  from  those 
found  in  step  6 would  not  be  selKted  immediately.  Instead,  all 
of  the  potential  matches  with  sufficiently  high  probability  would 
be  saved  until  the  entire  picture  had  been  processed.  Then  a 
cooperative  algorithm  similar  to  that  discussed  by  Marr  and 
Poggio  [3]  could  be  used  to  choose  the  best  matches.  This 
should  produce  more  reliable  matches,  but  with  a large  increase 
in  computation  time. 


Ground  Surface  Finder 


Once  the  three-dimensional  positions  of  a large  number  of 
points  in  an  outdoor  scene  have  brnn  determined,  it  is  desired  to 
determine  which  points  are  on  the  ground  and  which  are  on 
objects  above  the  ground.  By  taking  a sufficiently  small  portion 
of  the  scene  the  ground  can  be  approximated  by  a simple 
surface  whose  equation  can  be  determined,  and  the  points  which 
lie  above  this  surface  by  more  than  an  appropriate  tolerance  can 
be  assumed  to  be  on  objects  above  the  ground. 

Such  a procedure  has  been  written,  which  assumes  in 
general  that  the  ground  surface  is  a two-dimensional  second 
degree  polynomial.  However,  weights  can  be  given  to  a priori 
values  of  the  polynomial  coefficients,  to  incorporate  any  existing 
knowledge  about  the  ground  surface  into  the  solution.  For 
example,  the  second  degree  terms  can  be  weighted  out  of  the 
solution  altogether,  so  that  the  ground  surface  reduces  to  a 
plane. 

To  determine  a ground  surface  from  a given  set  of  data,  a 
set  of  criteria  which  define  what  is  meant  by  a good  ground 
surface  is  needed.  These  include  the  number  of  points  within 
tolerance  of  the  surface  (the  more  the  bener),  the  number  of 
points  which  He  beyond  tolerance  below  the  surface  (the  fewer 
the  better,  since  these  would  be  due  to  errors  such  as 
mismatched  points  in  a stereo  pairX  and  the  closeneu  of  the 
surface  coefflcienu  to  the  « priori  values.  Note  that  the  number 
of  poinu  above  the  surface  does  not  matter  (other  than  that  it 
detracts  from  the  number  within  the  surfactX  because  many 
poinu  can  be  on  objecu  above  the  ground.  A score  for  any 
tentative  solution  U computed  based  on  these  criteria,  and  the 
solution  with  the  highest  score  is  assumed  to  be  correct, 
although  a solution  with  a lower  score  can  be  selected  by  a 
higher  level  procedure  using  more  global  criteria.  The  scoring 
function  currently  used  is 
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where  N is  the  number  of  poinu  within  tolerance  of  the  surface 
(these  poinu  were  used  to  determine  the  surface  by  a 
least-squares  fit),  n is  the  a prieri  expected  number  of  poinu  In 
the  surface,  B Is  the  number  of  pmnu  below  the  surface  by. 
more  than  the  tolerance,  b is  the  e prieri  approximate  maximum' 
number  of  poinu  below  the  surface,  the  Cj  are  the  coeffldenn  of 
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the  fitted  surface,  Cj'  are  their  a primi  values,  ffj  are  the 
standard  deviations  of  these  a prltrl  values,  and  m Is  the 
number  of  these  coefficients  which  were  adjusted. 

Finding  the  best  solution  (according  to  the  scoring 
function)  out  of  all  of  the  possible  solutions  is  a search  problem. 
What  is  needed  is  a method  which  will  be  likely  to  find  the 
correct  solution  without  requiring  huge  amounts  of  computer 
time.  The  method  used  uses  some  heuristics  to  lead  the  search 
to  the  desired  solution.  Its  main  points  can  be  described  briefly 
as  follows. 

First,  a least-squares  solution  is  done  using  all  of  the 
points.  This  fit  is  saved  for  refinement  leading  to  one  tentative 
solution.  Then  all  points  within  tolerance  of  this  fit  or  too  low, 
but  not  less  Shan  half  of  the  points  used  in  this  fit,  are  selected, 
and  another  least-squares  fit  Is  done  on  these  points  and  saved. 
This  process  repeats  until  there  are  too  few  points  left.  (This 
portion  of  the  algorithm  drives  downward  to  find  the  low 
surfaces,  even  though  there  may  a large  amount  of  clutter  above 
them.) 

The  refinement  of  each  of  the  above  fits  is  done  as 
follows.  The  standard  deviation  of  the  points  used  in  the  fit 
about  the  fitted  surface  ts  computed.  Then  all  points  within  one 
standard  deviation  (or  within  the  original  tolerance)  of  the 
surface  are  used  in  a new  least-squares  fit.  This  process 
continues  until  it  stabilizes,  in  which  case  the  score  of  the  result 
IS  computed,  or  until  there  are  too  few  points  in  the  solution. 
(This  portion  of  the  algorithm  rejects  erroneous  points  and 
some  clutter,  in  order  to  find  well-defined  surfaces.) 

Results 

Figure  I shows  a stereo  pair  of  photographs  taken  from 
positions  approximately  1.6  feet  apart  in  a parking  lot.  Each 
digitized  picture  is  270  pixels  wide  and  210  pixels  high. 

Figure  2 shows  the  points  found  in  the  left  picture  by  the 
interest  operator  and  the  corresponding  points  (using  the  same 
arbitrary  symbols)  matched  in  the  right  picture  by  the  binary 
search  correlator.  The  points  encircled  were  rejected  because  of 
low  probability  (<0.1)  estimated  by  the  high-resolulion 
correlator.  The  points  in  squares  were  rejected  by  the  editing  in 
the  camera  model  solver.  The  remaining  -points  then 
determined  the  camera  model  solution. 

Figure  3 shows  the  matches  produced  by  the  searching 
algorithm,  constrained  by  this  camera  model,  using  the 
high-resolution  correlator  (with  eight-by-elght  windows).  Notice 
that  the  algorithm  made  several  Incorrect  matches,  particularly 
in  the  left  foreground.  This  is  a result  of  the  fact  that  there  was 
very  little  contrast  in  the  texture  on  the  pavement,  resulting  in  a 
low  signal-to-noise  ratio.  Nevertheless,  there  are  a sufficient 
number  of  correct  matches  so  that  the  later  stages  of  processing 
are  not  bothered  by  these  errors. 

The  left  side  of  Figure  4 shows  the  component  of  distance 
(in  feet)  parallel  to  the  optical  axis,  computed  from  the  matches 
for  all  points  that  the  algorithm  matched,  superimposed  on  the 
left  picture.  (Single  characters  are  use  for  these  plots,  with  0 
through  9,  A through  B,  and  a through  b representing  0 
through  61.  The  number  sign  represents  values  from  62  to  100, 
and  the  infinity  symbol  represents  everything  greater  than  100. 
The  right  side  of  Figure  4 shows  the  relative  standard  deviation 
computed  for  these  points,  to  the  nearest  foot.  (The  relative 
standard  deviation  indicates  only  those  errors  which  tend  to 
differ  for  nearby  poinu.  The  total  standard  deviation  is  larger 
and  indicates  the  absolute  accuracy  of  the  distances.) 

Figure  5 shows  the  resulu  of  transforming  the 
three-dimensional  information  represented  by  the  distances  into 
an  approximately  horizontal  coo^lnate  system.  The  Camera  I 


position  is  at  the  bottom  center  of  the  figure,  looking  towards 
the  top.  Plotted  are  the  helghu  above  the  reference  plane  in 
feet,  using  the  single  characters  described  above. 

The  portion  of  the  data  between  ranges  of  10  feet  and  60 
feet  was  given  to  the  ground  surface  finder.  The  heights  of 
these  points  above  the  resulting  plane  are  shown  in  Figure  6. 

Figure  7 is  the  same  u Figure  6,  except  that  it  shows  only 
those  poinu  with  a height  of  at  least  two  feet.  The  poinu  above 
this  threshold  are  on  the  two  vehicles  in  the  pictures,  the  light 
poles,  and  some  shrubbery  near  the  light  poles,  with  a few  error 
poinu. 

The  results  of  processing  a pair  of  aerial  photographs  are 
shown  in  Figures  8 through  II.  Figure  8 shows  the  stereo 
displacements  in  pixels  computed  by  the  searching  algorithm  for 
a stereo  pair  of  pictures,  superlmpt^  on  one  of  the  pictures  in 
the  proper  positions.  (The  steps  used  to  obtain  the  camera 
model  are  not  shown.)  Since  the  range  of  heighu  in  the  picture 
IS  small  compared  to  the  height  of  the  camera,  these  stereo 
displacements  are  approximately  proportional  to  heighu  above 
an  arbitrary  plane.  Figure  9 shows  heights  above  the  ground 
plane  found  by  the  ground  surface  finder,  and  Figure  10  shows 
only  those  points  with  heighu  that  would  be  expected  for  poinu 
on  cars.  Figure  II  shows  heighu  scaled  so  that  I is 
approximately  the  height  of  the  hc^  or  fender  of  a car,  2 is 
approximately  the  height  of  the  roof  of  a car,  and  3 through  6 
lepiesent  heights  that  might  be  found  on  large  trucks.  In  this 
figure,  values  of  0 represent  points  that  appear  to  be  on  the 
ground. 

Figures  12  through  14  show  some  results  for  another  pair 
of  aerial  photographs  with  lower  resolution.  Figure  12  shows 
heights  above  an  arbitrary  plane,  and  Figure  13  shows  heighu 
above  the  computed  ground  plane.  The  points  that  appeared  to 
be  sufficiently  above  the  ground  plane  to  be  possibly  on 
builclings  were  given  to  the  ground  surface  finder  again,  with 
instiuctions  to  find  a plane  parallel  to  the  ground  plane,  and  it 
found  the  surface  corresponding  to  the  horizontal  roof  of  the 
large  building  in  the  picture.  Figure  14  shows  the  poinu  in  the 
original  picture  that  appear  to  Im  on  the  ground  (marked  'O') 
and  on  the  roof  (marked  "R"). 

Future  Plans 

It  is  planned  to  use  the  stereo  system  in  a system  for 
operating  an  experimental  exploring  vehicle.  Stereo  pairs  will 
be  taken  from  various  locations  and  their  results  will  be 
combined.  Poinu  which  lie  sufficiently  above  the  ground  will 
be  clustered  into  individual  objects,  and  simple  size  and  shape 
information  will  be  computed  for  each  object.  A data  structure 
containing  a catalogue  of  objects,  with  their  locations,  sizes,  and 
shapes,  and  properties  of  the  ground,  will  be  built  up  as  the 
vehicle  moves  through  its  environment.  By  comparing  this 
information  to  older  portions  of  the  data  struaure,  the  vehicle 
can  determine  if  it  is  in  a previously  seen  area. 

There  are  several  opportuniies  for  the  previously 
described  componenu  of  the  system  to  cooperate  and  to  paw 
Information  back  and  forth.  For  example,  the  high-resolution 
correlator  has  several  parameters  used  to  give  it  a priert 
information  on  the  noise  level  In  the  data  and  changes  in 
brightness  and  contrast  between  the  two  pictures.  It  also 
produces  a posttriori  estimates  of  these  quantities.  These  resulu 
from  early  applications  of  the  correlator  (for  example  with  the 
points  us^  to  obtain  the  camera  model  and  not  rejected  by  the 
camera  model  solver)  can  given  to  the  correlator  in  later 
applications.  Also,  when  the  ground  surface  finder  is  given 
points  from  a certain  portion  of  the  scene,  it  can  be  given  e 
priori  values  and  weighu  for  the  surface,  which  can  be  obtained 
from  ground  surface  solutions  for  adjacent  areas.  Furthermore, 


if  Its  apparently  best  solution  does  not  agree  with  those  of 
adjacent  areas,  an  alternate  solution  from  the  ground  surface 
finder  can  be  used  which  agrees  better  with  neighboring  areas, 
even  though  it  may  not  be  as  good  according  to  local  criteria. 
A II  of  these  features  may  be  implemented  in  the  future. 

Other  possibilities  include  an  automatic  segmenter  to 
produce  regions  of  complicated  terrain  for  the  ground  surface 
finder  to  work  on,  and  the  addition  of  a priori  knowledge  about 
the  environment,  including  models  of  objects  expected. 
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Figure  2.  Matching  points  found  by  binary-search  correlator,  relected  by 
high-resolution  correlator  (circled),  and  rejected  by  camera  m^el  (boxed). 


Figure  3.  Matching  points  found  by  search  algorithm  using  camera  model  and  high-resolution  correlator. 
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Figure  Distances  (left)  and  relative  standard  deviations  (right)  to 
points  in  left  picture  of  Figure  1.  The  symbols  are  defined  in  the  text. 


Figure  5.  Heighu  Above  reference  plane. 


Figure  12.  Arbitrary  heights  (stereo  displacements)  superimposed  on  one  picture  of  a stereo  pair. 
("A"  through  "Z”  represent  10  through  S5.  and  "a"  through  i"  represent  S6  through  61.) 
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AbttrscI 

The  development  of  Image  understanding  systems 
requires  the  parallel  development  of  software  tools  to 
measure  and  analyze  their  performance  and  behavior.  This 
paper  briefly  describes  the  design  of  a sensor  database  and 
discusses  its  use  in  performance  evaluation  of  segmentations 
and  labeling. 

Introduction 

Many  problems  arise  in  the  design  of  image 
understanding  systems  which  require  large  amounts  of 
image  data  to  be  readily  available  for  use  m a variety  of 
experiments.  Such  databases  must  contain  not  only  the 
images  themselves  (signal  description  ot  the  scene)  but  also 
symbolic  descriptions  of  the  content  of  Ihe  image.  A 
symbolic  description  is  one  where  a symbolic  name  is  used 
to  represent  a collection  of  points  in  the  imai;e.  A symbolic 
name  associates  a real-world  object  or  sub-object  name 
with  a collection  of  picture  points.  Ususll>  one  needs  a 
hierarchy  of  symbolic  levels  where  each  level  represents  a 
different  conceptual  abstraction  of  the  image.  The 
maintenance  of  a hierarchy  of  these  representations  is 
necessary  in  order  to  define  and  analyze  the  structural 
relationships  between  symbolic  levels.  Both  the  symbolic 
and  signal  descriptions  must  be  organized  into  a database 
which  allows  efficient  and  accurate  representations  of 
images. 

While  the  number  and  type  of  repre  entations  may 
vary  from  task  to  task  it  is  important  that  the  database 
provide  mechanisms  to  add  or  modify  repiesenlations  as 
necessary.  The  particular  representation  paradigm  chosen 
should  be  applicable  to  a wide  variety  of  imajes.  Tne  grain 
of  the  representation  should  allow  for  efficient  mappings  of 
the  signal  (image)  into  the  representation  space.  MIDAS,  a 
multi-sensor  image  database  system,  is  currently  under 
development  at  CMU  and  attempts  to  fulfill  these  design 
goals  (McKeown  and  Reddy,  1977). 

MIDAS  Organization 

MIDAS  is  composed  of  three  interactive  subsystems 
(Figure  1).  First,  the  (JUERY  system  can  locate  images  with 
particular  attributes  such  as  Stnsor,  Setn*  \jpa,  Source  of 
imanc  or  Owner  ie.  “all  color  cityscape  scene,  processed  by 
Ohiander".  The  CATLOG  system  contains  funrtiOns  to  insert, 
delete  and  modify  image  representations.  The  PICPAC 
system  provides  general  picture  modification,  analysis,  and 
manipulation  procedures. 

MIDAS  maintains  multiple  data  structuies  in  order  to 
efficiently  represent  both  idealized  and  experimentally 
generated  scene  descriptions.  The  primary  d-ita  structure  is 
a set  of  text  files  which  contain  hierarchical  symbolic 
descriptions  Other  data  structures  include  a relational 
database  used  primarily  for  interactive  query 
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FIGURE  2 Image  Description  File 
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Representation,  Figure  2 is  a portion  of  an  image 
description  tile  (IDF)  for  a Pittsburgh  city  scene.  Some 
obvious  features  such  as  minimum  bounding  rectangle, 
center  ot  mass,  and  number  of  pixels  are  generated 
automatically  by  SYRIU5,  a system  lor  interactive  s,uidence 
in  the  generation  of  symbolic  descriptions  (Smith  and 
Reddy,  1977).  Maskpoints  is  a vector  list  v hich  g ves  the 
outline  of  a region.  Other  feature  attributes  including 
structural  relations  can  be  added  incrementatiy  by  MIDAS  or 
other  programs. 

A hierarchical  symbolic  description  of  im  image  is  one 
where  different  conceptual  abstractions  of  the  image  are 
represented  by  levels  of  the  hierarchy.  The  description  of 
the  scene  is  complete  at  each  level  of  abstraction  and 
therefore,  depending  on  the  detail  Of  analysis  required,  a 
higher  or  lower  level  representation  may  be  chosen.  Each 
description  is  generated  in  terms  of  symbolic  names  for 
Objects  found  in  the  image  at  that  level  ano  the  siructural 
relationships  between  these  objects.  Structural  relations 
are  concepts  such  as  oboice,  fe/t-o/,  coinposed-of  and 
vertical. 

A careful  study  of  the  capabilities  necessary  to 
perform  the  above,  leads  us  to  the  following  general 
requirements  for  our  database  system.  The  calabase  should 
provide  uniform  access  functions  to  all  image , and  allow  the 
maintenance  of  partial  and  alternate  representations  of 
images.  The  notion  that  the  database  is  only  partially 
complete  with  respect  to  the  description  of  each  image, 
requires  that  the  incremental  updating  of  image  descriptions 
be  handled  in  an  efficient  and  consistent  manner.  Extracted 
features  should  be  storer*  with  eacii  image  cfescription  so 
that  the  database  can  be  used  as  a medium  for  knowledge 
acquisition.  We  impose  a hierarchy  of  roprnsentations  for 
each  image  which  allows  for  the  efficient  evaluation  of 
systems  performance  at  different  representational  levels.  At 
each  representational  level  there  are  structures  which  are 
given  symbolic  names  and  have  associated  feature  sets 
which  were  used  tio  derive  the  structure  itself. 
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FIGURE  3 MIDAS  Relations 


Figure  3 give  a list  of  the  current  working  set  of 
relations.  Each  relation  contains  several  attributes  to  which 
particular  values  can  be  associated.  Some  ot  the  attributes 
are  obvious  image  features:  byte  size,  number  of  rows,  and 
number  of  sensor  bands.  Others  are  symbolic  features: 
symbotic  name  and  level,  and  feature  valuer,  such  as  size, 
orientation,  and  minimum  bounding  rectangle. 

The  relational  database  is  generated  automatically  by 
MIOAS  using  a subset  of  the  information  contained  in  the 
image  description  files  (lOF).  Updating  the  database  is 
performed  by  modifying  the  text  files  which  must  then  be 
compiled  into  the  relational  database.  The  advantage  to  this 
is  that  the  relational  database  becomes  a static  data 
structure  where  access  and  search  procedures  can  be 
tailored  to  the  current  structure. 


The  first  argument  of  each  relation  is  designated  as  a 
primary  key.  When  a primary  key  is  bound  to  a value  the 
relation  defines  a unique  mapping  of  the  prin.ary  key  to  the 
secondary  keys  in  the  relation.  Secondary  keys,  when 
bound  in  a relation,  return  all  primary  keys  that  satisfy  the 
relation  for  all  secondary  keys  specified.  Ary  key,  primary 
or  secondary,  may  take  on  a don’t  cart  value  during  the 
search  and  any  unbound  key  will  be  bound  to  a value  when 
all  bound  keys  are  satisfied. 

Applications.  There  are  several  areas  ot  research 
interest  which  require  the  availability  of  organized  signal 
and  symbolic  data:  performance  evaluation,  error  analysis, 
learning,  and  knowledge  representation.  However,  the  main 
motivation  for  this  work  arises  from  the  belief  that  uniform 
representation  and  organization  of  images  alluw  researchers 
to  evaluate  algorithms  and  techniques  over  a wide  variety  of 
images  without  the  burden  of  redeveloping  specialized  tools 
for  representation.  Within  our  research  environment  we 
have  several  hundred  images  of  various  types  ranging  from 
earth  satellite  multi-spectral  pictures  to  electron 
photomicrographs  of  ganglia.  These  images  are  generated  by 
a variety  of  sensors  including  flying  spot  scanner,  color 
scanner,  side  looking  radar,  electron  beam,  and  LANOSAT 
multi-spectral  scanners.  Thus,  from  a practical  standpoint, 
the  database  serves  an  important  function  if  only  to  keep 
track  of  what  pictures  are  available  artd  what  previous 
processing  has  been  performed. 

Performance  evaluation  involves  determining  how 
closely  a machine  generated  description  matches  an 
idealized  description  of  the  image.  We  shall  discuss  this  in 
detail  in  the  following  sections. 

Error  analysis  is  an  extension  of  performance 
evaluation  in  that  we  must  be  able  to  describe  Ihe  nature  of 
errors  which  occur  in  automatic  scene  descriptions.  Given 
mismatched  or  omitted  objects  one  can  begin  to  localize  and 
diagnose  sources  ot  errors  in  segmentation,  feature 
extraction,  and  labeling  by  comparing  machine  results  with 
the  corresponding  descriptions  within  the  database.  This 
allows  us  to  localize  and  identify  possible  causes  of  the 
error. 

Programs  to  learn  structural  and  feature  descriptions 
of  objects  can  be  developed  given  specific  exemplars  from  a 
variety  of  images.  The  automatic  learning  of  symbolic 
feature  primatives  (signal  to  symbol  transformations)  is 
essential  for  progress  towards  general  image  understanding 
systems.  MIDAS  provides  a uniform  representation  for  a 
large  variety  of  objects  occuring  in  various  types  of  images. 
In  the  rest  of  this  paper  we  shall  describe  some  of  the 
techniques  under  consideration  for  performance  evaluation. 

J> 

4 

Performance  Evaluation 

Performance  evaluation  ot  image  utrderst  ending 
systems  requires  the  determination  of  how  closely  the  image 
interpretation  produced  by  a program  matches  that  of  a 
human.  Most  systems  are  developed  and  debugged  on  a 
small  set  of  images  and  little  or  no  validation  of  results  is 
performed  outside  of  the  working  set.  Where  validation  is 
performed  it  usually  is  a subjective  analysis  rather  than 
along  qualitative  dimensions.  Perhaps  tNs  indicates  how 
little  we  urKferstand  about  Ihe  problem.  In  this  section  we 
will  outline  our  approach  to  systematic  study  of 
performance  evaluation.  Much  of  our  work  is  preliminary 
and  our  ideas  artd  techniques  will  surely  change  with 
experience,  however  we  believe  progress  in  this  area  is 
essential  for  the  long  term  success  of  image  understanding 
programs. 
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Requirements.  The  first  requirement  for  any  type  of 
performance  analysis  is  the  ability  to  generate  "tha  truth" 
about  the  image.  This  icfeoi  setnt  dascription  (ISO) 
corresponds  to  what  we  feel  our  programs  should  produce 
as  their  output.  The  ISO  comes  in  many  flavors;  since 
symbolic  representations  are  possible  at  many  levels  of 
abstraction,  performarKe  of  the  system  can  be  measured 
independently  at  each  level.  Each  level  then  has  its  own  ISO 
which  must  be  represented  and  evaluated  in  order  to 
measure  the  goodness  of  Knowledge  sources  used  in  image 
Interpretation.  For  example,  at  the  scene  level  it  might  be 
acceptable  if  the  scene  was  correctly  identified  as  an  office 
scene,  even  though  some  objects  within  the  scene  were 
incorrectly  labeled.  At  a lower  level,  the  evaluation  of  edge 
position  or  region  boundaries  would  be  necessary  and  the 
ISO  would  be  stated  in  terms  of  these  features.  The 
evaluation  of  misplaced,  omitted,  or  added  boundaries  would 
be  appropriate  to  provide  performance  data  at  these  levels. 

Problems.  There  are  several  problems  which  are 
encountered  when  one  attempts  to  compare  scene  ISDs  with 
machin*  generated  descriptions  (MGO).  First,  our 
representations  are  not  always  correct.  There  must  be 
facilities  to  refine  and  continuously  update  these  idealized 
descriptions.  These  representations  are  created  using 
SVRIUS,  a system  for  interactive  guidance  in  the  generation 
of  symbolic  descriptions  (Smith  and  Reddy,  1 977).  SYRIUS 
provides  convenient  means  for  building  ISDs,  creating  or 
modifying  segmentation  masks,  and  extracting  and  cataloging 
attributed  features  in  conjunction  with  MIDAS.  Second,  it  is 
not  clear  that  there  is  only  one  correct  inlerprelation. 
Often  alternate  interpretations  of  image  feature  descriptions 
(especially  at  the  lower  levels)  are  equally  acceptable. 

Ktethods.  Even  given  accurate  representations,  the 
comparisons  and  measurements  that  we  wish  to  make  are 
not  straightforward.  At  each  level  we  must  understand  what 
changes  are  allowable  or  relevant  and  determine  what 
knowledge  can  be  brought  to  bear  to  aid  in  the  analysis. 
For  example,  at  the  segmental  level  there  are  several 
phenomena  to  be  accounted  lor.  First,  whtre  an  edge  or 
boundary  exists  in  the  ISO,  the  distance  of  the  same  edge  in 
the  MGD  must  be  computed  and  scored.  In  addition,  the 
frequency  of  omitted  and  extra  edges  in  the  MGD  must  be 
tabulated  and  scored.  The  success  of  a MGO  then  should  be 
measured  as  a function  of  the  frequency  of  extra  and 
omitted  edges  and  a figure  of  merit  for  the  distance 
between  edges  in  the  ISO  and  MGD.  The  implication  here  is 
that  the  ISO  representation  must  allow  for  uncertainty  in  the 
position  of  edges.  This  distance  is  the  area  between  the 
ISO  edge  and  its  counterpart  in  the  MGO. 
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Figure  4 illustrates  the  use  of  uncertainty  intervals  in 
the  ISO  edge  descriptions.  No  penalty  is  imposed  while  the 
MGD  edge  remains  within  the  interval,  however  its  distance 
is  measured  and  scored.  If  multiple  edge  points  are  found 
within  the  interval  they  are  scored  as  extra  edges.  The  lack 
of  an  edge  within  the  ISO  interval  counts  as  a missing  edge. 

Labeling  accuracy  can  be  measured  by  comparing,  on 
a point  by  point  basis,  the  primitive  picture  elenr.ent  (PPE) 
(Rubin  and  Reddy,  1977)  assigned  in  the  MGD  to  that  in  the 
ISO.  Typical  analysis  would  include  the  percent  of  pixels 
that  were  correctly  labeled  and  a confusion  matrix.  A 
confusion  matrix  tabulates  the  frequency  at  which  one  PPE 
is  mislabeled  as  another.  This  matrix  can  txr  used  lo  tune 
the  feature  descriptors  which  define  the  PPE  or  indicate 
when  PPE's  should  be  split  or  grouped  into  new  classes. 
Some  preliminary  results  using  data  produced  by  tne  AR(jOS 
system  (Rubin,  1977)  are  given  in  Figure  5.  The  frequency 
distribution  of  PPEs  for  both  training  and  labeling  phases 
are  shown  along  with  the  confusion  matri>.  We  plan  lo 
improve  this  analysis  by  measuring  the  feature  distances  for 
PPEs  which  are  frequently  confused. 

At  the  region  level  it  is  also  possible  to  use  simple 
techniques  such  as  pixel  counting  to  determine  how  well  the 
image  was  partitioned  into  regions.  We  propose  the 
following  procedure  for  determining  the  relative  goodness 
of  alternative  region  partitions.  It  is  necessary  lo  register 
regions  in  the  MGO  with  those  in  the  ISO  and  count  the 
number  of  pixels  covered  and  missed  by  thi'  MGO  regions. 
Registration  is  performed  by  calculating  the  number  of 
pixels  which  overlap  between  the  ISO  and  MGO  regions  on  a 
pairwise  basis.  An  initial  covering  of  MGO  regions  onto  ISO 
regions  is  made  such  that  the  the  mapping  is  1 to  1 and 
each  of  the  ISO  regions  has  been  assigned  an  MGO  region 
which  covers  the  greatest  area  of  the  ISO  region  Any 
remaining  MGO  regions  are  mapped  so  that  they  maximize 
the  number  of  ISO  region  points  covered.  Once  all  MGD 
regions  have  been  registered  the  number  of  pixels  cov'ered 
and  missed  can  be  counted.  A figure  of  merit  can  be 
determined  from  the  percent  of  area  correctly  covered  and 
the  number  of  missing  or  extra  regions  in  the  MGO. 

Figure  6 gives  an  example  of  this  procedure  for  two 
alternative  machine  descriptions:  MGD  I and  MGD  11.  The 
shaded  area  in  the  registered  MGDs  corresponds  to  the 
number  of  pixels  which  were  missed.  The  ratio  of  missed  lo 
covered  pixels  is  one  relative  measure  of  goodness.  Using 
this  critf  a MGD  I (40/256)  is  a better  region  description 
that  MGD  II  (56/256).  However  note  that  M3D  I generated 
two  extra  regions  while  MGD  II  produced  only  one  more 
than  the  ISO.  Penalties  must  be  factored  into  the  scoring  to 
handle  cases  where  a MGD  with  a large  number  of  extra 
regions  covers  the  ISO  better  than  one  with  Ihe  exact 
number  of  regions. 

Conclc»ions 

We  have  illustrated  how  the  MIDAS  sensor  database 
can  be  used  in  the  performance  evaluation  domain.  The 
examples  described  are  Ihe  beginning  of  a set  of 
performance  evaluation  tools  currently  being  developed. 
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FIGURE  4 Edge  distance  metric 
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ABSTRACT 

Analysis  of  aerial  images  and 
location  of  small  structures  is  a complex 
task.  However,  larger  objects  can  be 
conveniently  located  by  using  segmentation 
techniques  best  suited  for  their  component 
extraction.  For  example,  the  edge  tech- 
niques are  suitable  for  extraction  of 
roads  and  the  region  techniques  for  lakes 
and  rivers.  Specific  objects  of  interest 
may  be  located  by  their  relationships  with 
these  more  easily  extracted  objects. 
Initial  results  of  work  in  programs  are 
presented. 


INTRODUCTION 

Analysis  of  aerial  images  is,  in 
general,  a complex  task.  The  reasons  for 
such  complexities  are  many  and  varied.  A 
prime  cause  is  the  presence  of  texture 
which  causes  difficulties  for  the  low 
level  processes  such  as  edge  detection  and 
segmentation.  Another  source  of  difficul- 
ty is  that  the  desired  objects  and  struc- 
tures may  be  small  compared  to  the  sice  of 
a complete  image.  A detailed  analysis  of 
a complete  high  resolution  aerial  image  is 
generally  prohibitive  because  of  the 
computational  costs . 

For  many  applications,  however,  a 
complete  and  general  analysis  is  unneces- 
sary. Specific  structures  of  Interest  may 
have  special  properties,  known  a priori, 
that  allow  for  their  easy  extraction.  The 
problem  of  searching  for  small  structures 
is  helped  by  locating  them  by  their  spatial 
relationships  to  larger,  more  easily  lo- 
cated structures. 

In  previous  work,  we  compared  two 
segmentation  techniques,  the  edge  based  and 
the  region  based  methods,  and  concluded 
that  one  or  the  other  may  be  suited  for 

*This  research  was  supported  by  the 
Advanced  Research  Projects  Agency  of  the 
Department  of  Defense  and  was  monitored  by 
the  Wright  Patterson  Air  Force  Base  under 
Contract  F-33615-76-C-1203  ARPA  Order  No. 
3119. 


extraction  of  particular  types  of 
structures  [1].  This  describes  our 
initial  attempts  to  use  both  techniques, 
taking  advantage  of  their  respective 
strong  points. 

PROBLEM  DESCRIPTION  AND  REPRESENTATION 

The  problem  approached  is  that  of 
finding  user  specified  structures  in 
aerial  images.  The  user  specifies  the 
properties  useful  for  the  location  of  the 
desired  structure  and  also  of  other 
related  structures.  (An  interactive, 
question-answer  dialog  system  is  being 
developed  to  facilitate  interaction  with 
a user,  see  [2].)  This  amount  of  a priori 
knowledge  is  likely  to  be  available  in 
many  applications  of  guidance  and  photo- 
interpretation . 

The  a priori  information  is  stored  as 
properties  of  objects  and  their  relation- 
ships to  each  other,  and  may  be  viewed  as 
constituting  a graph  structure  with  the 
objects  as  nodes  and  relationships  as  arcs. 
The  properties  and  relationships  will,  in 
general,  need  to  be  unrestricted.  Cur- 
rently, an  object  is  described  either  by  a 
collection  of  line  segments  or  by  its 
region  properties.  Use  segments  are  des- 
cribed by  their  length  and  width.  The 
regions  are  described  by  properties  such 
as  brightness  (color)  and  simple  shape 
measures  (area,  perimeter,  ratio  of  area 
to  perimeter  squared,  elongation,  etc.). 

The  relationships  used  are  those  of 
relative  locations  of  the  different  ob- 
jects and  the  symbolic  relationships  of 
left,  right,  above  and  below.  Other 
relationships  such  as  symmetry  and  simil- 
arity are  obviously  useful,  but  have  not 
been  implemented. 

Our  representation  and  use  of  knowl- 
edge is  similar  to  that  described  by 
Tenenbaum  [31 . The  principal  difference 
is  in  Tenenbaum' s use  of  single  pixel 
attributes  to  uniquely  distinguish  objects 
(in  a given  context).  We  use  object 
attributes  to  aid  in  the  segmentation  of 
the  image  and  then  use  the  attributes  of 
larger,  segmented  parts  for  recognition. 
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FEATURE  EXTRACTION  AND  SEGMENTATION 

Feature  extraction  and  segmentation  is 
guided  by  the  properties  of  the  desired 
objects  to  be  extracted.  Thus,  an  edge 
detection-line  finding  process  is  applied 
to  extract  desired  linear  segments  (such 
as  roads)  and  a region  segmentor  for 
extracting  areas  uniform  in  some  property 
(for  example  lakes  and  other  bodies  of 
water) . 

Consider  the  aerial  image  shewn  in 
figure  1 (the  displayed  image  contains 
352  X 352  pixels,  an  image  of  twice  the 
resolution  is  also  used  in  the  analysis) . 
Here,  an  objective  may  be  to  locate  the 
dock  structure  and  perhaps  some  ships  in 
it.  As  this  structure  consists  of  rela- 
tively small  parts  and  is  complex,  it  may 
be  easier  to  extract  related  structures 
such  as  the  river,  the  major  highway  and 
the  lakes  first,  and  use  these  to  concen- 
trate the  search  for  docks  to  a smaller 
area  of  the  image.  (We  assume  such  in- 
formation is  supplied  by  the  user.  No  at- 
tempt has  been  made  to  automate  the 
strategy  generation  process,  as  in  [4].) 

Edge  detection  processes  are  appro- 
priate for  the  extraction  of  the  desired 
roads.  Figure  2 shows  the  results  of 
applying  a Hueckel  edge  detector  [5]  on  the 
image  of  figure  1 and  linking  the  resulting 
edge  segments  in  elongated  segments  [6]. 

The  road  is  known  to  be  narrow  enough  that 
the  edges  corresponding  to  it  are  of  the 
"line"  type  (as  contrasted  with  a step 
edge) . Restricting  the  linked  edges  to  be 
only  of  line  type  results  in  fewer  segments 
(shown  in  figure  2) . 

The  lakes  and  parts  of  the  river  are 
conveniently  extracted  by  using  the 
Ohlander-Price  Segmentor  [7].  It  is  known 
that  the  desired  objects  are  relatively 
dark  and  uniform  in  intensity,  and  the  dark 
peak  in  the  intensity  histogram  should  be 
used  for  segmentation.  Figure  3 shows  the 
intensity  histogram  for  this  image.  The 
completed  segmentation  is  shown  in  figure 

4. 

MATCHING  OF  SEGMENTS 

The  next  step  is  to  match  the  derived 
line  segments  and  regions  with  a model  of 
the  image.  This  phase  of  our  work  is  in 
progress  and  experimental  results  are 
expected  to  be  available  soon.  Assuming 
that  the  derived  segments  are  distinctive 
enough  to  be  easily  distinguished,  approx- 
imate locating  of  the  dock  structures  can 
be  predicted.  Now,  sensitive  line  detect- 
ors should  help  locate  the  piers  of  the 
dock . (We  have  found  the  Hueckel  edge 
detector  to  be  deficient  in  locating  small 


edges , perhaps  beca^use  of  the  large  neigh- 
borhood size  used.  Development  of  more 
sensitive  edge  and  line  detectors  is  being 
carried  out  concurrently,  see  [8].) 

CONCLUSIONS 

Some  results  of  processing  a complex, 
aerial  image  using  both  the  line  and  the 
region  based  techniques  have  been  shown. 

It  appears  that  the  use  of  simple  tech- 
niques, specifically  suited  to  particular 
objects  in  an  image,  may  allow  useful 
processing  of  rather  complex  images . This 
work  is  in  initial  stages  of  development 
and  the  array  of  segmentation  attributes 
is  limited.  While  it  is  hoped  that  the 
described  techniques  have  general  appli- 
cability, our  experience  with  real  images 
is,  as  yet,  limited. 
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ABSTRACT 

The  Fourier  Descriptor  method  is  a well-known 
method  of  describing  the  shape  of  a closed  figure. 
Previous  work  with  Fourier  Descriptors  has  suf- 
fered from  either  a loss  of  shape  information,  or 
excessive  computation  time  in  comparing  an  unknown 
contour  to  a known  one.  This  paper  presents  a 
technique  for  normalizing  Fourier  descriptors 
which  retains  all  shape  information,  and  is  compu- 
tationally efficient. 

In  recognizing  three-dimensional  objects, 
most  of  the  computation  time  is  typically  spent  in 
computing  distances  between  an  unknown  feature 
vector  and  a library  of  feature  vectors  represent- 
ing the  objects  of  interest.  An  interpolation 
property  of  Fourier  descriptors  is  described  which 
permits  a substantial  reduction  in  the  density  of 
projections  representing  a three-dimensional  ob- 
ject. Preliminary  experimental  results  are 
presented  in  which  this  algorithm  is  applied  to 
recognition  of  aircraft  outlines. 


FOURIER  DESCRIPTORS 

The  Fourier  Descriptor  (FD)  is  one  method  of 
describing  the  shape  of  a closed,  planar  figure. 
Given  a figure  in  the  complex  plane,  the  contour 
can  be  traced,  yielding  a (one-dimensional)  com- 
plex function  of  time.  If  the  contour  is  traced 
repeatedly,  the  periodic  function  which  results 
can  be  expressed  in  a Fourier  series.  The  FD  of  a 
contour  is  defined  to  be  this  Fourier  series. 

To  implement  this  method  of  shape  descrip- 
tion, it  is  necessary  to  sample  the  contour  at  a 
finite  number  of  points.  Since  the  discrete 
Fourier  transform  of  a sequence  gives  us  the 
values  of  the  Fourier  series  coefficients  of  the 
sequence,  assuming  it  to  be  periodic,  using  an  FFT 
algorithm  satisfies  the  definition  above.  The 
computational  advantages  of  the  FFT  are  well 
known. 

Once  the  Fourier  descriptor  has  been  comput- 
ed, the  operations  of  rotation,  scaling,  and  mov- 
ing the  starting  point  are  easily  implemented  in 
the  frequency  domain  by  simple  arithmetic  on  the 
frequency  domain  coefficients.  While  shapes  may 
be  compared  in  the  space  domain,  the  procedures 
required  to  adjust  their  size  and  orientation  are 
computationally  very  expensive.  Normally  an 
iterative  type  of  algorithm  is  employed,  which 
searches  for  an  optimum  match  between  the  unknown 


shape  and  the  reference  set. 

Granlund's  C1]  approach  to  shape  information 
extraction  involves  defining  "Fourier  descriptors" 
by  considering  products  of  Fourier  series  coeffi- 
cients which  are  shown  to  be  invariant  to  posi- 
tion, size,  orientation,  and  starting  point  fac- 
tors. This  results  in  an  increase  in  data  dimen- 
sionality from  N to  N^/2,  without  any  change  in 
total  information.  (Since  the  FFT  is  a reversible 
linear  transformation,  all  the  shape  information 
is  contained  in  the  original  N coefficients.) 
Granlund  also  computed  his  FD  coefficients  using 
digital  or  analog  integration  techniques,  which 
are  quite  expensive  computationally. 

Persoon  and  Fu  C2]  retained  al I the  shape  in- 
formation inherent  in  the  original  contour,  and 
reduced  the  problem  to  one  of  finding  an  optimum 
size,  orientation,  and  starting  point  match 
between  each  sample  FD  and  the  reference  FD.  This 
optimization  process  was  used  to  check  the  simi- 
larity of  each  of  the  possible  reference  contours. 
While  this  method  does  retain  all  of  the  shape  in- 
formation, and  guarantees  to  find  an  optimum  size, 
orientation,  and  starting  point  match,  it  is  still 
fairly  time  consuming.  In  addition,  the  FDs  them- 
selves were  computed  by  an  integration  process. 

Richard  and  Hemami  C4]  computed  the  FDs  using 
an  FFT,  and  then  normalized  their  magnitudes  only. 
They  were  able  to  perform  magnitude  classifica- 
tions efficiently  using  this  technique,  but  the 
true  distance  measure,  including  all  shape  infor- 
mation, required  two  FFTs  for  each  comparison  of 
an  unknown  FD  to  a reference  FD. 

This  paper  describes  an  algorithm  which  com- 
putes FDs  efficiently  using  the  FFT,  and  normal- 
izes the  FFT  output  vector  to  a standard  size, 
orientation,  and  starting  point  before  comparing 
it  to  any  reference  FDs.  While  exact  minimization 
of  the  distance  between  two  arbitrary  shapes  is 
not  guaranteed,  if  the  shapes  are  similar  enough 
to  warrant  an  identification  of  the  unknown  shape, 
the  distance  found  will  be  very  close  to  the 
minimum.  A general  normalization  algorithm  is 
presented,  and  additional  theorems  are  presented 
for  the  case  of  contours  possessing  bilateral  sym- 
metry. The  classification  problem  is  also  dis- 
cussed, and  relationships  between  contours  and 
their  FDS  are  investigated. 

C 

NORMALIZATION 

The  frequency  domain  operations  which  affect 
the  position,  size,  orientation,  and  starting 
point  of  the  contour  follow  directly  from  proper- 
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ties  of  the  OFT.  To  change  the  position  of  a con- 
tour, just  vary  the  zero  frequency  (DC)  coeffi- 
cient of  the  FO.  Adding  a coaplex  constant  to 
every  point  in  the  tiae  doaain  representation  of  a 
contour  is  equivalent  to  adding  that  value  to  the 
DC  tern  of  the  DFT. 

To  change  the  size  of  the  contour,  the  coa- 
pcnents  of  the  FD  are  sinply  aultiplied  by  a con- 
stant. Due  to  linearity,  the  inverse  transfora 
will  have  its  coordinates  aultiplied  by  the  sane 
constant. 

To  rotate  the  contour  in  the  tiae  doaain  sia- 

i 0 

ply  requires  nultiplying  each  coordinate  by  e-* 
where  e is  the  angle  of  rotation.  Again  by 

A 

linearity,  the  constant  e(j  ) has  the  saae  effect 
when  the  frequency  doaain  coefficients  are  nulti- 
plied  by  it. 

To  see  how  the  contour  starting  point  can  be 
moved  in  the  frequency  doaain,  recall  the  tiae 
shifting  property  of  the  DFT.  Shifting  the  start- 
ing point  of  the  contour  in  the  tiae  doaain 
corresponds  to  multiplying  the  ith  frequency  coef- 
ficient in  the  frequency  doaain  by  e(j’^),  where  T 
is  the  fraction  of  a period  through  which  the 
starting  point  is  shifted.  (As  T goes  from  0 to 
2«,  the  starting  point  traverses  the  whole  contour 
once.) 

Given  the  FD  of  an  arbitrary  contour,  the 
normalization  procedure  requires  performing  the 
normalization  oprations  such  that  the  contour  has 
a standard  size,  orientation,  and  starting  point. 
The  following  method  of  FD  normalization  preserves 
all  of  the  shape  information  while  rejecting  noise 
effectively.  In  order  to  reject  noise,  the  coef- 
ficients used  in  the  procedure  are  chosen  to  have 
as  large  magnitudes  as  possible. 

First,  we  require  the  phases  of  the  two  larg- 
est coefficients  to  be  zero.  A(1)  will  always  be 
the  largest,  with  magnitude  unity  due  to  the  scale 
normalization  procedure  which  defines  that  magni- 
tude. Let  the  second  largest  coefficient  be  Adt). 
(The  frequencies  of  the  coefficients  produced  by 
an  FFT  of  length  N range  from  -(N/2)+1  to  (N/2}} 
the  normalization  multiplicity  Hof  coefficient 
Adc)  is  defined  as: 

H = |k-1| 

Thm:  The  requirement  that  Ad)  and  Adc)  have  zero 
phase  angle  can  be  satisfied  by  N different 
orientation/starting  point  combinations. 

Proof:  Use  the  two  allowable  operations  to  arrive 
at  one  orientation  and  starting  point  which  gives 
zero  phase  for  Ad)  and  Adt).  Next  use  the  start- 
ing point  movement  operation  (mul tipi ication  of 

the  ith  coefficient  by  e^*^  to  move  the  starting 
point  once  around  the  entire  contour.  To  accom- 
plish this  T must  range  from  0 to  2<.  Now  con- 
sideer  the  two  cases  k positive  and  k negative. 
If  k is  positive,  the  phases  of  Ad)  and  A(k)  will 
coincide  at  k-1  different  starting  points.  But  at 
each  of  these  starting  points,  we  can  use  the 
orientation  operation  (multiplication  of  each 

coefficient  by  e(j^)  to  reduce  the  phases  to  zero. 
Similarly,  if  k is  ngative,  the  phases  of  Ad)  and 
A{k)  will  coincide  at  1-k  different  starting 
points.  Again,  the  orientation  operation  can 


reduce  the  phases  to  zero. 

Note  that  if  k“2,  the  orientation  and  start- 
ing point  are  defined  uniquely,  in  general,  how- 
ever, A(2)  will  not  be  the  second  largest  coeffi- 
cient in  magnitude  so  this  ambiguity  must  be 
resolved  to  achieve  a general  procedure. 

The  obvious  method  of  solving  this  problem  is 
to  check  the  phase  of  a third  coefficient  A(p)  at 
each  of  the  H possible  orientation/starting  point 
combinations  and  choose  the  normalization  which 
gives  a phase  closest  to  zero  for  this  coeffi- 
cient. However,  this  ambiguity-resolving  coeffi- 
cient cannot  be  chosen  arbitrarily.  If  the  nor- 
malization multiplicity  of  coefficient  A(p)  is  the 
same  as  that  of  A(k),  or  a multiple  of  it,  the 
phase  of  A(p)  will  be  the  saae  at  each  possible 
normalization!  If  H for  coefficient  A(p)  (denoted 
NCp3)  is  a factor  of  HCk3,  or  a multiple  of  a fac- 
tor of  NCk]  less  than  NCk],  there  is  also  ambigui- 
ty since  some  of  the  N possible  normalizations 
will  result  in  identical  phases  for  A(p) . If 
these  ambiguous  coefficients  are  removed  from  con- 
sideration, and  the  unambiguous  coefficient  with 
the  largest  magnitude  is  used  to  select  one  of  the 
N allowable  normalizations,  a general  procedure  is 
obtained. 

To  briefly  review  the  entire  normalization 
procedure,  we  start  by  dividing  each  coefficient 
by  the  magnitude  of  A(1)  to  normalize  the  size  of 
the  contour.  He  find  the  coefficient  of  second 
largest  magnitude  and  comoute  its  normal ization 
multiplicity.  He  then  locate  the  third  largest 
coefficient  suitable  for  resolving  the  ambiguity 
(A(p))  as  explained  above.  The  orientation  and 
starting  point  are  adjusted  to  satisfy  the  res- 
trictions that  Ad)  and  A(k)  are  real  and  posi- 
tive, and  A(p)  has  phase  as  close  to  zero  as  pos- 
sible. 

This  method  is  quite  powerful,  but  a slight 
modificaton  in  the  procedure  has  been  found  help- 
ful in  those  cases  in  which  there  are  two  or  more 
coefficients  suitable  for  normalization  with  al- 
most the  same  magnitude.  It  is  very  unlikely  that 
the  magnitudes  will  be  identical,  but  if  they  are 
even  close,  noise  may  cause  one  of  them  to  be  used 
to  normalize  the  test  FD,  and  the  other  to  normal- 
ize the  unknown  FO.  To  overcome  this,  the  normal- 
izaton  coefficients  used  to  normalize  the  test  FD 
can  be  supplied  to  the  normalization  subroutine 
directly,  rather  than  having  the  subroutine  com- 
pute them, 

CLASSIFICATION  NETHODS 

Given  two  normalized  FDs  (NFDs),  how  do  we 
measure  their  degree  of  similarity?  An  appropri- 
ate classification  method  is  essential  if  we  are 
to  compare  unknown  shapes  to  a test  set. 

Consider  two  sampled  contours  a(i)  and  b(i), 
and  define  the  difference  c(i)  = a(i)  - b(i). 
Evidently  if  a(i)  and  b(i)  are  identical,  c(i)  is 
identically  zero.  If  a(i)  and  b{i)  are  not  ident- 
ical, the  magnitudes  of  the  c(i)  coefficients  are 
a reasonable  measure  of  the  difference  between 
aCi)  and  b(i).  Now  consider  the  frequency  domain 
vectors  corresponding  to  a(i),  b(i),  and  c(i), 
denoted  a(i),  b(i),  and  c(i).  Due  to  linearity, 
we  have  c(i)  * a(i)  - b(i).  Applying  Parseval *s 
theorem  to  the  difference  vector,  we  find  that  the 
sum  of  the  squares  of  the  differences  of  the  real 
and  imaginary  parts  of  each  coefficient  of  two  FDS 
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is  proportional  to  their  point  by  point  aean 
square  error  in  the  space  doaain.  The  mean  square 
distance  Measure  in  the  frequency  doaain  is  seen 
to  correspond  to  a reasonable  tiae  doaain  cri- 
terion which  weights  each  point  equally.  In 
recognizing  a contour  corrupted  by  such  factors  as 
quantization  error  or  poor  photographic  resolu- 
tion, such  a criterion  seeas  appropriate.  The  ef- 
fectiveness of  this  classification  aethod  is 
deaonstrated  by  the  experiaents  described  below. 

Since  the  closed  contour  is  a continuous 
function,  the  Fourier  series  converges  fairly  ra- 
pidly, as  would  be  expected.  H.S.  of  the  H.s. 
distance  between  two  FDs  is  due  to  relatively  feu 
coefficients,  and  the  classifications  reported 
below  use  no  aore  than  30  coefficients. 


FO  - CONTOUR  RELATIONSHIPS 

If  a FO  consists  of  coefficient  Ad)  only, 
with  all  other  coefficients  zero,  it  will 
transfora  back  to  the  tiae  doaain  as  a saapled 
circle.  Higher  frequency  coefficients  also 
transfora  back  as  saapled  circles,  but  they 
traverse  the  circle  a nuaber  of  tiaes.  A(k)  will 
yield  a tiae  doaain  seequence  which  traces  a saa- 
pled circle  k tiaes  in  the  the  counterclockwise 
direction.  A(k)  and  A(-k)  together  yield  a saa- 
pled ellipse,  in  a aanner  anal sgous  to  the  ellipt- 
ical polarization  of  el ectroaagnetic  theory. 

Due  to  I inearl ity,  a contour  in  the  tiae 
doaain  consists  of  a sua  of  the  inverse  transforas 
of  is  FD  coefficients.  Hence  this  view  of  each  FD 
coefficient  as  a saapled  “phasor"  yields  insight 
into  the  relationships  between  a contour  and  its 
FD.  A(1>  is  the  fundaaental  frequency  coefficient 
which  is  always  the  largest  in  Magnitude,  and  is 
forced  to  have  Magnitude  unity  by  the  Magnitude 
noraal ization  procedure.  It  is  of  interest  to 
describe  the  figures  generated  by  A(1)  and  A(k) 
coabined,  with  all  other  coefficients  zero,  since 
often  Most  of  the  "energy"  of  a FD  is  contained  in 
as  feu  as  two  coefficients.  Interestingly  enough, 
the  "noraal ization  Multiplicity"  H defined  above 
plays  a part  here,  with  the  contour  resulting  froa 
nonzero  A(1)  having  NCk]-fold  rotational  syaaetry. 
Granlund  observed  that  contours  with  k-fold  rota- 
tional syaaetry  consist  of  coaponents  whose  fre- 
quencies are  Multiples  of  k-1.  If  k is  negative, 
the  contours  are  siailar  to  polygons,  and  if  k is 
positive,  the  contours  generally  are  quite  round, 
and  appear  to  be  loops  superiaposed  on  a circle. 
If  k»-1,  the  contour  is  of  course  a saapled  el- 
lipse. Host  contours  of  interest  taken  froa  actu- 
al photographic  data  have  a negative  frequency 
coefficient  as  the  second  largest  in  Magnitude. 

Figure  1 shows  four  contours  whose  FDs  have 
only  two  nonzero  coefficients  coapletely  deteraine 
the  shape  of  the  figure  generated,  with  the  phases 
only  affecting  orientation  and  starting  point. 
Note  also  that  the  unifora  saapi ing  condition  in 
the  tiae  doaain  is  not  satisfied  when  any  arbi- 
trary FD  is  inverse  transforaed. 

Consider  now  a bilaterally  syaaetric  contour 
in  the  tiae  doaain.  It  can  bn  shown  CS3  that  a 
Fourier  Descriptor  represents  a bilaterally  tya- 
aetric  contour  iff  the  rotation  and  starting  point 
shift  operations  can  be  perforaed  such  that  the 
iaaginary  part  of  each  FD  coefficient  (except 
A(0>>  is  zero. 


2-DIMENSIONAL  AIRCRAFT  RECOGNITION 

This  aethod  of  extracting  shape  inforaation 
was  experiaental ly  tested  on  20  airplane 
silhouettes  which  were  digitized  to  two  different 
resolution  versions  were  quite  accurate  represen- 
tations of  the  aircraft,  while  the  low  resolution 
versions  showed  significant  distortion  of  soae  of 
the  saaller  features  such  as  engines.  Using  the 
high  resolution  contours  as  a test  set,  an  atteapt 
was  Made  to  classify  the  low  resolution  contours 
using  this  FD  algoritha.  Using  a aean  square  dis- 
tance Measure,  95Z  classification  accuracy  was  at- 
tained. The  aircraft  were  of  four  different 
types.  Figures  2 and  3 show  high  and  low  resolu- 
tion contours  representing  each  type.  Figure  A 
shows  the  Magnitudes  of  the  NFDs  coaputed  froa  the 
high  resolution  contours. 

The  aircraft  outlines  are  approxiately  bila- 
terally syaatric,  although  quantization  error 
prevents  thea  froa  being  exactly  syaaetric.  The 
noraal ization  procedure  will  always  yield  a NFD 
whose  inverse  transfora  has  starting  point  on  the 
real  axis,  and  whose  axis  of  syaaetry  coincides 
with  the  real  axis,  given  the  FD  of  a bilaterally 
syaaetric  contour.  Which  of  the  two  points  at 
which  the  axis  of  syaaetry  intersects  the  contour 
will  actually  be  the  starting  point  depends  on  the 
aabiguity-resolving  procedure  described  above. 
The  procedure  generally  favors  the  point  furthest 
froa  the  origin  of  the  coaplex  plane,  but  supply- 
ing a selected  aabiguity-resolving  coefficient  to 
the  noraal ization  subroutine  can  reverse  this.  In 
ase  both  possible  starting  points  a’e  approxiaate- 
ly  equidistant  froa  the  origin,  the  starting  point 
resIting  froa  noraal ization  is  soatwhat  unpredict- 
able. This  is  the  situation  in  which  it  is  advis- 
able to  check  that  the  unknown  FD  is  noraal i~:d 
using  the  saae  aabiguity-resolving  coefficient  as 
the  test  FD. 

Since  the  actual  experiaental  contours  inves- 
tigated were  not  perfectly  bilaterally  syaaetric, 
the  noraal ization  subroutine  did  not  always  result 
in  a starting  point  which  falls  on  the  best  esti- 
aate  of  the  axis  of  syaaetry.  However,  since  the 
algoritha  was  written  to  reject  noise,  the  start- 
ing point  was  always  quite  close  to  the  axis  of 
syaaetry. 

THE  THREE-DIMENSIONAL  F>R0BLEM 

It  has  been  shown  C23  that  averaging  the  FDs 
of  two  different  shapes  (frequency  doaain)  yields 
a FD  which  will  inverse  transfora  to  a shape  which 
appears  to  be  an  "average"  contour,  interaediate 
in  shape  between  the  two  original  contours.  The 
data  base  which  aust  be  stored  to  represent  a 
three-diaensional  object  can  be  reduced  by  using 
fewer  projections,  and  "interpolating"  between 
thea  in  the  frequency  doaain.  This  approach  also 
enables  a aore  accurate  estiaation  to  be  aade  of 
the  actual  orientation  in  space  of  the  object  re- 
lative to  the  caaera.  Previous  work  on  this  esti- 
aation  problea  C33,  CA3  has  assuaed  that  the 
orientation  of  the  unknown  object  is  that  of  the 
nearest  reference  projection.  This  evidently  I ia- 
its  estiaation  accuracy  to  the  resolution  of  the 
reference  projection  set. 

INTERPOLATION  AND  SAMPLING  ERROR 

One  theoretical  problea  concerns  the  saapi e 
spacing  used  in  saapi ing  a tiae  doaain  contour. 
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As  explained  in  CSl,  a unifora  t^<«pli^g  strategy 
is  eaployed  in  the  present  algoritha,  in  order  to 
facilitate  analysis  of  a Hide  variety  of  shapes. 
While  non-unifora  saapi ing  can  result  in  faster 
convergence  of  a FO  C2],  there  are  obvious  coapi i- 
cations  involved  in  defining  such  a sampling  stra- 
tegy for  general  shapes.  When  operations  on  FDs 
are  made  in  the  frequency  doaain,  there  is  no 
guarantee  that  the  resulting  tiae  doaain  represen- 
tation Hill  have  unifora  saapi ing.  Hence,  even  if 
a contour  appeared  identical  to  an  "average"  con- 
tour computed  by  using  linear  combinations  of 
knoHn  FDs,  different  'laple  spacing  could  result 
in  some  finite  difference  betneen  the  two  FOs. 

Consider  the  case  in  uhich  an  unjcnoun  projec- 
tion lies  directly  betueen  tno  library  projec- 
tions. Due  to  linearity,  given  tno  FDs  a and  b,  a 
ueighted  sum  of  a and  b transforms  back  as  the 
same  ueighted  sum  of  the  transforms  of  a and  b. 
It  is  easy  to  perform  an  experiment  to  measure  the 
magnitude  of  this  error.  Simply  computing  the  in- 
terpolated FD,  inverse  transforming,  resampling 
uniformly,  and  transforming  gives  us  two  FDs  Hhose 
H.S.  distance  is  a measure  of  the  point  density 
error.  Experiments  with  tno  shapes  uhose  NFDs  had 
a distance  of  about  .3  <a  distance  less  than  .1  is 
generally  used  as  a classification  threshold) 
shoved  a point  density  error  190  to  200  times  less 
than  the  distance  betneen  the  original  NFDs,  vith 
the  interpolation  coefficients  equal  to  one  half. 
The  number  of  points  used  in  the  space  domain  vec- 
tors had  a slight  effect  on  the  error,  vith  more 
densely  sampled  vectors  producing  slightly  less 
error.  It  can  be  concluded  that  this  problem 
should  lot  have  a noticeable  effect  on  the  alno- 
rithffl,  since  in  practice,  adjacent  projections  can 
be  expected  to  have  a NFD  distance  much  less  than 
.3.  This  fact  should  further  minimize  the  point 
density  error. 

THE  ESTIMATION  PROBLEM 

In  the  example  discussed  above,  a projection 
was  assumed  to  lie  directly  betneen  two  library 
projections,  and  the  experiment  vas  performed  ac- 
cordingly. In  general,  hovever,  any  random  pro- 
jection nil  I not  lie  on  the  grid  defined  by  a li- 
brary of  projections.  Henr  . more  than  two  li- 
brary projections  must  he  urvu  to  perform  the  es- 
timation. If  a rectangular  grid  of  projections  is 
used,  it  HOuld  seem  reasonable  to  do  the  estima- 
tion based  on  four  library  projctions,  but  consid- 
er instead  t.;.  general  case  of  estimating  an  N- 
vector  X(k)  as  a linear  combination  of  M N-vectors 
Y.(k),  1 < I < M: 
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l>^(k)  * Y^<k)-Y,(k)  AND  a^  = 1 


DATA  REDUCTION 

The  time  required  to  execute  the  above  esti- 
mation algorithm  is  dependent  on  the  dimension  (N) 
of  the  vectors.  It  is  thus  desirable  to  reduce 
the  dimension  of  the  vectors  as  far  as  possible 
uithout  degrading  the  classification  performance. 
Equally  important  is  the  problem  of  storage  of  li- 
brary data  representing  a three-dimension  object. 
Previous  FD  classification  results  have  indicated 
that  there  is  no  advantage  in  using  more  than  30 
(complex)  coefficients.  In  fact,  quite  good 
results  have  been  obtained  vith  only  H,  although 
there  vas  a si ight  degradation  in  performance  vhen 
compared  vith  30. 

The  obvious  approach  is  to  estimate  the  auto- 
correlation matrix  or  covariance  matrix  of  the  da- 
ta, and  find  the  eigenvalues  and  eigenvectors 
uhich  provide  optimal  data  compression.  There  can 
be  some  difficulty  in  computing  eigenvalues  and 
eigenvectors  of  a 60  by  60  matrix.  (Our  feature 
vector  consists  of  30  complex  coefficients.)  One 
way  to  reduce  the  size  of  this  matrix  is  to  con- 
vert the  data  to  30  real  coefficients.  There  are 
two  uays  that  this  might  be  done.  The  most  obvi- 
ous way  IS  to  simply  take  the  magnitudes  of  the  FD 
coefficients,  since  classifications  based  on  mag- 
nitude information  alone  have  been  shown  to  be 
quite  effective.  However,  if  the  data  has  bila- 
teral symmetry,  the  associated  NFDs  should  au- 
tomatically be  real.  Even  if  the  data  does  not 
have  this  symmetry,  the  normalization  procedure 
tends  to  minimize  the  magnitudes  of  the  imaginary 
parts  of  the  NFD,  and  correspondingly  minimize 
their  contribution  to  the  classification.  Hence 
virtually  all  the  information  can  be  preserved  by 
simply  taking  the  real  part  of  each  NFD  coeffi- 
cient. This  is  the  approach  that  was  used  in  the 
experiment  described  below. 

THE  3D  ALGORITHM 


It(k)  * ^ *1  'fi<k)  (2) 

subject  to  the  restriction  that 


It  is  straightforward  to  show  that  the  optimum 
linear  mean  square  estimate  of  X(k)  is  given  by 
the  solution  to  the  equations; 


An  experiment  was  performed  in  which  unknown 
aircraft  outlines  were  identified  and  their  orien- 
tation in  space  estimated  using  the  above  results. 
First,  a set  of  aircraft  was  synthesized  using  a 
graphics  approach.  Three-dimensional  approxima- 
tions were  constructed  for  six  different  aircraft, 
a mirage,  a mig,  a phantom,  an  F10A,  an  F10S,  and 
a B57.  Figure  5 shows  representative  images  gen- 
erated by  this  program.  These  three  dimensional 
images  were  then  rotated  through  appropriate  an- 
gles to  create  a library  of  projections.  The  pro- 
gram was  first  given  the  library,  and  then  given 
randomly  selected  orientations  to  identify. 

The  experiment  of  Dudani  et  al  C33  was  very 
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siailar,  but  several  iaportant  differences  should 
be  noted.  First,  the  data  used  by  Dudani  was  con- 
structed using  nodel  aircraft  and  a television 
camera  hookup.  It  might  appear  that  this  is  a 
more  real istic  approach,  as  wel I as  a more  demand- 
ing experiment  than  one  using  graphically  generat- 
ed data.  However  there  are  two  problems  with  this 
method  which  the  graphical  method  avoids.  First, 
the  resolution  of  the  mechanical  mount  used  by  Du- 
dani was  S degrees.  This  represents  an  error  in 
data  generation  which  is  avoided  by  the  more  exact 
graphics  approach.  Second,  since  the  camera  is  a 
finite  distance  from  the  model,  parallax  problems 
affect  the  images,  making  the  camera  image  dif- 
ferent from  the  image  received  from  a long  dis- 
tance. Since  most  practical  photographs  of  actual 
aircraft  in  flight  would  be  at  a large  distance, 
this  error  is  undesirable. 

In  addition  to  the  above  considerations,  it 
would  probably  be  easier  to  use  graphics  tech- 
niques in  a practical  system,  since  accurate 
graphical  representations  constructed  from  blue- 
prints would  probably  generate  library  data  faster 
and  more  accurately  than  model-TV  camera  setups. 

A major  advantage  of  Dudani ’s  approach  is  the 
accuracy  of  the  aircraft  shape.  Our  grahics  pro- 
gram approximates  each  plane  by  using  about  SQ-100 
(geometric)  planes.  A more  elaborate  program 
could  generate  an  arbitrarily  accurate  representa- 
tion of  each  aircraft,  with  corresponding  increase 
in  computation  time.  The  present  data  gives  a 
reasonably  good  approximation  to  each  aircraft, 
although  the  small  detail  is  lacking.  The  effect 
on  the  classifications  is  probably  to  increase 
their  difficulty,  since  certain  minor  features  are 
missing.  On  the  other  hand,  the  data  can  probably 
be  represented  by  fewer  projections,  due  to  the 
reduced  complexity. 

The  basic  problem  considered  by  Dudani  was 
classifying  unknown  aircraft  images  oriented  at  5 
degree  intervals  in  a HO  degree  by  90  degree  sec- 
tor. Each  aircraft  was  represented  by  a library 
of  moment  feature  vectors  computed  from  SS1  pro- 
jections within  this  sector.  The  classification 
was  performed  by  computing  distances  from  a moment 
feature  vector  of  the  unknown  image  to  the  moment 
feature  vectors  cf  the  library  images,  and  then 
classifying  using  a distance-weighted  k-nearest 
neighbor  rule.  Note  that  the  images  used  by  Du- 
dani did  not  contain  any  mirror  image  pairs,  and 
hence  are  obviously  not  all  the  images  which  can 
be  theoretically  recognized.  In  fact,  a little 
reflection  will  convince  one  that  if  an  object  has 
enough  assymetry,  it  can  be  recognized  at  any  an- 
gle at  all,  and  that  angle  identified.  In  the 
case  of  aircraft,  there  is  generally  bilateral 
symmetry,  but  this  does  not  necessarily  greatly 
limit  the  set  of  angles  which  can  be  theoretically 
recognized. 

Our  algorithm  recognizes  aircraft  outlines  tak- 
en from  a sector  of  180  by  180  degrees,  i.e.,  a 
hemisphere.  Note  also  that  if  the  angles  near  the 
front  view  and  rear  view  of  the  aircraft  are 
deleted,  the  problem  is  much  easier,  si  tee  the 
shapes  vary  much  more  radically  when  large  sur- 
faces are  viewed  almost  edgewise.  Dudani's  con- 
sideration of  only  HO  degrees  reduces  this  prob- 
lem. Our  algorithm  alto  recognizes  random  projec- 
tions. There  is  no  quantization  of  random  projec- 
tions corresponding  to  Dudani's  5 degree  incre- 


ments. Finally,  the  first  version  of  our  algo- 
rithm uses  only  99  projections  to  represent  an 
aircraft  over  ♦f--  hemisphere,  which  represents  a 
density  of  project  nns  H.3  times  less  than  used 
by  Dudani. 

The  actual  classif'  “-ion  program  proceeds  as 
follows.  First,  tf  library  of  projections  is 
computed,  and  the  NFD  of  each  projection  is  com- 
puted. The  autocorrelation  matrix  of  the  NFDs  is 
computed,  and  an  eigenvalue-eigenvector  transfor- 
mation reduces  the  data  dimensionality  from  30 
complex  numbers  to  S complex  numbers.  The  real 
parts  of  the  complex  nixxbers  are  used  to  compute 
the  autocorrelation  matrix,  but  the  complex  parts 
of  the  transformed  coefficients  are  kept  to  assist 
in  the  classification. 

Next  the  N.S.  distance  from  a given  unknown 
contour  to  each  library  vector  is  computed.  The 
distance  to  the  nearest  library  contour  is  saved 
as  the  current  best  estimate  of  the  minimum  N.S. 
distance  achievable.  The  projections  adjacent  to 
the  nearest  library  projection  are  investigated  by 
the  N.S.  estimation  algorithm  described  above  in 
an  attempt  to  interpolate  between  the  library  pro- 
jections. 

The  interpretation  of  the  estimation  coeffi- 
cients returned  by  the  estimation  subroutine  is 
somewhat  heuristic,  and  goes  something  like  this. 
This  routine  is  constrained  to  return  four  numbers 
whose  sum  is  1.0.  It  often  happens  that  two  of 
the  vectors  being  used  to  estimate  the  unknown 
contour  are  not  very  similar  to  the  unknown  con- 
tour, but  are  quite  similar  to  each  other.  In 
this  case,  the  estimation  coefficients  are  of 
similar  magnitudes  and  opposite  sign,  such  as  2.0 
and  -2.05.  What  the  estimation  algorithm  is  doing 
is  using  the  difference  vector  to  help  generate 
the  optimum  N.S.  estimate  of  the  unknown  vector. 
We  of  course  do  not  want  to  allow  this  kind  of  es- 
timation, since  it  is  inconsistent  with  our  theory 
of  interpolation  of  FOs.  Another  thing  which  is 
commonly  observed  when  the  unknown  vector  differs 
from  the  library  vectors  being  used  to  estimate  it 
is  a set  of  large  positive  and  negative  estimation 
coefficients  being  returned.  This  again  just 
tells  us  that  we  cannot  expect  to  find  a reason- 
able interpolated  FD  in  the  sector  determined  by 
that  set  of  library  projections. 

The  heuristic  solution  to  these  effects  is  as 
follows.  First,  we  quit  looking  in  a particular 
sector  if  the  estimation  coefficients  returned  are 
too  large  in  magnitude.  The  algorithm  is  not  very 
sensitive  to  this  magnitude,  and  1.5  to  2.0  is 
usually  used.  Also,  if  two  coefficients  sum  to  a 
small  number  (.1),  but  have  relatively  large  mag- 
nitudes (>.5)  they  are  assumed  to  be  cancelling 
coefficients,  and  are  deleted  from  the  estimation 
set.  The  estimation  process  is  then  repeated  with 
the  remaining  two  vectors  being  used  to  estimate 
the  unknown  set.  Similarly,  any  negative  coeffi- 
cients are  deleted  from  an  estimation,  and  the 
remaining  two  or  three  are  used  to  repeat  the  es- 
timation process.  When  an  estimation  of  the  unk- 
nown vector  in  terms  of  two,  three,  or  four  adja- 
cent library  projection  vectors  is  achieved  in 
which  all  the  coefficients  lie  between  zero  and 
one,  the  distance  is  compared  to  the  minimum  dis- 
tance achieved  so  far.  If  the  new  distance  is 
less,  the  minimum  distance  is  updared. 

This  process  may  be  repeated  for  the  k nearest 


library  projections,  where  k is  optional,  and  is 
generally  in  the  range  of  4-10.  If  the  distances 
to  the  nearest  k projections  are  approxiaately 
equal,  the  full  k projections  will  be  investigat- 
ed. However,  projections  whose  distances  are  sore 
than  1.S  to  2.0  tines  greater  than  the  ainiaua 
distance  are  not  investigated.  Each  library  pro- 
jection has  one,  two,  or  four  sectors  surrounding 
it  which  aust  be  investigated  by  the  estiaation 
subroutine.  (If  the  sector  is  in  the  aiddle  of 
the  library  set,  there  are  four,  if  it  is  on  the 
border,  there  are  two,  and  if  it  is  in  a corner, 
there  is  one.)  After  the  desired  nuaber  of  possi- 
ble sectors  are  investigated,  there  are  two  possi- 
ble procedures.  The  estiaated  orientation  is  tak- 
en to  be  that  of  the  original  nearest  library  vec- 
tor, if  the  estimation  fails  to  improve  on  this 
distance.  If  the  estimation  procedure  is  success- 
ful, the  orientation  is  coaputed  by  multiplying 
the  orientations  of  the  vectors  used  in  the  esti- 
mation by  their  appropriate  coefficients. 

Results  to  data  using  6 aircraft,  and  classify- 
ing SO  unknown  images  for  each  one  show  classifi- 
cation accuracy  of  SOX  overall.  The  classifica- 
tion accuracy  is  about  72X  if  the  estimation 
routine  is  not  used. 


It  is  expected  that  this  figure  can  be  improved 
by  making  use  of  the  real  and  imaginary  parts  of 
the  data  when  computing  the  eigenvector  transfor- 
mation matrix.  Also,  the  spacing  of  library  pro- 
jections used  in  this  experiment  is  non-uniform  in 
an  attempt  to  increase  the  Fourier  space  distance 
uniformity  of  the  projections  set,  but  it  is  not 
optimum.  Finally,  it  may  be  necessary  to  increase 
the  density  of  projections  somewhat  to  bring  the 
classification  accuracy  up  to  that  achieved  by  Du- 
dani.  The  current  attempt  to  get  by  with  a pro- 
jection density  at  14.3  times  lower  than  Pudani 
may  be  too  optimistic. 

Given  a chain  code  representation  of  an  air- 
craft projection,  the  normalized  FD  is  computed  in 
about  2.76  sec.  This  time  includes  an  FFT  which 
is  of  length  512  or  1024.  This  normalized  FD  is 
then  classified  and  its  orientation  estimated  in 
about  2.38  sec.  These  times  are  for  a PDP  11/45 
with  floating  point  hardware.  The  program  itself 
is  written  in  Fortran  and  is  a research  tool  rath- 
er than  a highly  efficient  implementation  of  the 
algorithm. 
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Fig. 


Inverse  Transforms  of  FO's  Consisting 
of  Selected  Coefficients 


Upper  Left 

A(l)  = 1.0 
A(4)  = 0.2 

Upper  Right 

A(I)  = 1.0 
A(-2)  » 0.2 

Lower  Left 

A(l)  = 1.0 
A(7)  - 0.2 

Lower  Right 

A(l)  - 1.0 
A(-5)  - 0.2 
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Figure  5 Representative  Aircraft  Images 
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ABSTRACT 

We  present  a fast  algoritha  for  pulse  width 
estimation  from  blurred  and  nonlinear  observations 
in  the  presence  of  signal  dependent  noise.  The 
main  application  is  the  accurate  measurement  of 
image  sizes  on  film.  The  problem  is  approached  by 
modeling  the  signal  as  a discrete  position  finite 
state  Markov  process,  and  then  determining  the 
transition  location  ihat  maximizes  the  a pos- 
teriori probability.  Ihe  method  was  applied  to 
the  measurement  of  the  width  of  a road  in  an  aeri- 
al photo  taken  at  an  altitude  of  5000  feet.  The 
resulting  width  estimate  is  accurate  to  within  a 
few  inches. 


INTRODUCTION 

This  work  presents  a fast  algorithm  for  pulse 
width  estimation  from  blurred  and  nonlinear  obser- 
vations in  the  presence  of  signal  dependent  noise. 
The  problem  is  motivated  by  the  need  for  accurate 
measurements  from  remotely  sensed  photographs. 

The  problem  is  approached  by  modeling  the 
signal  (reflected  light  intensity)  as  a discrete 
position  finite  state  Harkov  process.  Sample 
functions  of  such  a process  are  graphically 
represented  by  a path  through  a trellis.  Blurred 
versions  of  these  signals  are  similarly  represent- 
ed. By  assigning  a cost  or  length  to  each  branch 
of  the  trellis  a HAP  sequence  estimate  of  the  sig- 
nal is  computed  by  finding  the  minimum  cost  or 
minimum  length  path  through  the  trellis.  HAP  se- 
quence estimates  produced  in  this  fashion  have 
unambiguous  edge  locations  making  them  useful  for 
pulse  width  measurements. 

The  Viterbi  algorithm  is  introduced  as  an  ef- 
ficient means  of  finding  the  minimum  cost  path 
through  the  trellis.  When  the  possible  states  are 
known  a-priori  the  algorithm  produces  asymptoti- 
cal ly  unbiased,  minimum  variance  discrete  width 
estimates.  The  decrease  in  performance  is  slight 
if  the  true  states  are  unknown  and  estimates  (ob- 
tained from  the  available  data)  used  in  their 
place. 

Computer  simulation  results  show  the  variance 
of  the  discrete  estimates  is  close  to  the  Cramer- 
Rao  bound.  The  algorithm  is  applied  to  the  meas- 
urement of  a road  in  an  aerial  photo  taken  at  an 
altitude  of  SCKKI  feet.  The  resulting  width  esti- 
mate is  accurate  to  within  a few  inches.  Experi- 
mental results  also  indicate  the  estimates  are  not 
sensitive  to  smal I variations  in  the  degrading 
system. 


MODEL  FOR  A STEP  EDGE 

A scan  line  across  a step  edge  is  character- 
ized by  an  initial  level  a^  at  position  1,  a final 

level  a^  at  position  M and  an  abrupt  transition 
between  levels  a^  and  a^  somewhere  between  posi- 
tions 1 and  M.  If  the  k—  sample  along  the  line 
has  value  a^  the  next  sample  i|^^^  can  assume  the 

value  a^  with  some  probability  (say  p^^)  or  can 
assume  the  value  a^  with  some  probability 
p^2  ^ sample  i|^  hao  a value  of  82 

then  i^^^  must  also  be  a^.  Thus  a simple  Harkov 
model  for  step  edges  is: 

Pr(i^=a^)  » 1 

Prli^=a2)  = 1 

Pr(i^^,=aj|i,,...,ik)  = Pr(ik*,=aj|i^) 

where 

Pr(i^^,.a,|ik*ai)  = p„ 

Pr(ik+1»a2|ik=«i)  * Pi2 


This  model  can  be  represented  graphically  as 
shown  in  figure  i(a).  The  nodes  represent  possi- 
ble intensity  levels  at  each  position  k.  The  dot- 
ted ( ines  between  nodes  (branches)  represent  pos- 
sible transitions  between  levels.  Each  branch  has 
been  labeled  with  the  transition  probability 
P(ik4^|i|,}  that  corresponds  to  the  nodes  the 

branch  connects.  Each  possible  path  through  the 
trellis  along  the  dotted  lines  represents  a dif- 
ferent edge  location.  Figure  Kb)  models  the  case 
where  there  is  uncertainty  about  the  levels  at  the 
initial  or  final  position.  In  particular  paths 
from  position  1 to  position  M along  the  top  or 
bottom  of  the  trellis  correspond  to  the  absence  of 
an  edge. 

MODEL  FOR  A BLURRED  EDGE 

Asstmie  that  a real  or  blurred  edge  is  ade- 
quately modeled  by  the  output  of  a blurring  system 
h when  the  input  to  h is  the  ideal  edge.  The  sys- 
tem h may  be  nonlinear;  however,  it  is  assumed 


that  the  saaple  of  the  output  of  h,  y^, 
depends  only  on  the  2w+1  adjacent  inputs  (i|j_ 

...•/iL/...»ik»y>  = ♦o'"  SO"*  V < • and  that 

there  is  a one  to  one  correspondence  between  the 
output  sequence  aod  the  input 

state  sequence  £*  Since  there  are 

only  i finite  nuaber  of  values  that  i|^  can  assume 

(two  in  the  case  of  step  edges)  there  are  only  a 
finite  number  of  states  (|^.  We  denote  the  input 

sample  sequence  by  I = (i^,. . .,i|i|> . 
from  (1); 

Pr(y^  = h(ai,...,a^))  = 1 
Pr(y|^^^  * h{a^ , • . , ,a^ ) I y|^*h (a^ ^ ) ) = p^^ 

Pr<y,j^^  = h(ai,...,ai,a2)|yk=h(ai,...,ai))  = p■^2 

Pr(yk+-|  “ h( a-| ^ . .^a^^a^)  I yk'hta'j ^ • .^a^^a^) ) • i 


PrCy,,^^  = h(a2,...,a2)  Iy|,=h(ai,a2,...,a2))  = 1 

Pr(y|^  = hCaj, . . .,a2))  = 1 <2) 

Thus  a blurred  edge  can  be  represented  by  a path 
through  a trellis  as  shown  in  Figure  2. 

Since  there  is  a one  to  one  correspondence 
between  K,  and  ^ any  one  of  the  three  can  be 

uniquely  represented  by  a path  through  the 
trellis. 

MAXIMUM  A POSTERIORI  PROBABILITY  SEQUENCE  ESTIMA- 
TION 

The  maximum  a-posteriori  probability  (MAP) 
estimate  of  a sequence  ^ given  a sequence  ^(^  be- 
ing a degraded  and  noisy  version  of  1/  is  defined 
as  a sequence  ^ = (’l ,i2»- • •/’»>  P^ill^I 

= ! is  a maximum.  To  calculate  I a model  is  need- 
ed for  the  relationship  betwen  ^ and  1.  The  as- 
sumed observation  model  is  shown  in  Figure  A. 
I = (i^,i2,...,ij,)  is  the  sequence  of  ideal  light 

intensities  with  ij^  the  light  intensity  of  the 
sample  point  entering  the  imaging  system.  Each  i|^ 
can  assume  one  of  G possible  values  a^,...,a|.. 

for  example  1^  might  represent  the  sequence  of  re- 
flected light  intensities  from  a scan  line  of  an 
aerial  photo  of  a bridge  across  a river.  In  this 
case  there  would  be  two  possible  intensity  levels: 
a^  corresponding  to  the  light  reflected  from  the 

water  and  a^  corresponding  to  the  light  reflected 
from  concrete  (or  whatever  construction  material 
was  used  in  the  bridge).  The  state  at  position  k, 
is  defined  to  be  a set  of  adjacent  intensities 

'’k-v''”'S'‘”'’k»w’'  »**“■* 

only  a finite  number  of  values  each  is  one  of  a 

finite  set  CS,  . . . S_).  Further  (to  within 

1 q 


boundary  conditions)  there  is  a one  to  one 
correspondence  between  the  state  sequence  £ and 
the  intensity  sequence  K ~ 

The  system  h(*)  represents  the  degradation  of 
the  sequence  In  the  case  of  photographic  im- 
agery this  includes  blurring  due  to  lense  defects, 
scattering,  diffraction,  camera  motion,  etc.  as 
well  as  the  nonlinear  relationship  between  light 
intensity  and  film  density.  The  only  assumption 
on  h is  that  there  is  a one  to  one  correspondence 
between  Jl  * (y^,...,y^)  (where  yj^  = h(E|j))  and 

fl  is  a sequence  of  independent  noise  samples. 
The  parameters  of  the  noise  distribution  may 
depend  on  the  signal.  For  example  film  grain 
noise  is  approximately  normal  with  a standard  de- 
viation approximately  proportional  to  a power  of 
the  signal  level. 

By  definition  the  HAP  sequence  estimate  ^ of 

i ■'s 

P(^|^)j_j  is  a maximum  (3) 

but  since  there  is  a one  to  one  correspondence 
between  ^ and  £,  (3)  is  equivalent  to 

P(t^|^)j_^  is  a maximum  (A) 

However,  by  Bayes  rule,  the  independence  of  the 
noise  and  a Markov  assumption  on  E (A)  is 
equivalent  to  maximizing: 

M 

P«4+lKt)  p(Zj|h(Et))  (5) 

or  equivalently  minimize 

H n 

52  -tog  P<«,*il(,)-log  p(z,|h(E,))  ^ 52  r<5,)(6) 

1=1  I X X X X 

By  assigning  a cost  or  length  of  r(Ej)  to  each 
branch  of  a trellis  it  is  easy  to  see  that  the  MAP 
estimate  I represents  the  lowest  cost  or  minimum 
length  path  through  the  trellis. 

There  is  a very  good  algorithm  for  finding 

the  minimum  cost  path  through  a trellis  due  to  Vi- 
terbi  C1,23  called  the  Viterbi  algorithm  (VA). 

UNKNOWN  LEVELS 

The  previous  sections  assumed  the  levels 

a^,a2,...  that  the  scan  line  could  assume  were 

known  a-priori.  In  practice  this  may  not  be  true. 
In  this  case  a reasonable  course  of  action  is  to 
obtain  “training"  samples  of  levels  characterizing 
the  object  to  be  measured  and  its  background,  and 
then  estimate  the  level  values  from  these  training 
samples. 

MEASUREMENT  OF  A ROAD 

A 1:5000  scale  black  and  white  negative  taken 
with  Kodak  Plus  X Xerographic  film  from  an  alti- 
tude of  5000  feet  was  obtained  and  digitized  on  a 
flying  spot  scanner.  The  scene  (sampled  at  a rate 
of  2A  samples/mm)  is  shown  in  Figure  A and  con- 
tains an  intersection  of  two  gravel  roads  in  War- 
ren County,  Indians.  Figure  5 shows  one  of  the 
roads  (sampled  at  a rate  of  96  samples/mm)  and 
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figure  6 shows  a scan  line  across  the  road  of  Fig- 
ure S.  Five  hundred  training  saaples  from  one  of 
the  roads  showed  the  average  density  was  .942  with 
a variance  of  .00213.  One  thousand  training  sam- 
ples from  the  field  surrounding  the  road  had  an 
average  density  of  .669  with  a variance  of  .00236. 
The  nominal  film  properties  were  obtained  from 
Torkington  t33  and  Paris  tA3.  The  frequency 
response  of  the  image  blur  was  assumed  to  be  the 
product  of  the  film  frequency  response 


T^(f)  = 


1 + 


(7) 


and  the  response  of  an  ideal  diffraction  limited 
lens  with  a cutoff  frequency  of  half  the  sampling 
rate: 

fj.  = 4£  cycles/mm  (8) 

figure  7 shows  the  line  spread  function 
corresponding  to  equations  (7)  and  (8). 

Ten  independent  measurements  of  one  of  the 
roads  were  made  and  the  results  shown  in  Table  1. 
The  variance  of  each  yj^  was  taken  to  be  .00213  if 

iy  corresponded  to  a state  where  i|^  was  a sample 
from  the  road  or  .00236  if  y|^  corresponded  to  an 
i|^  from  the  field. 

The  variance  of  the  digital  measurement  was 
1.1S  sample  points  squared  and  the  uncertainty  in- 
dicated in  Table  1 represents  plus  or  minus  two 
standard  deviations. 

Ten  optical  measurements  were  made  with  a SO 
power  magnifier  and  a reticle  marked  in 
thousandths  of  an  inch.  Table  1 shows  the  results 
and  uncertainty  of  this  measurement.  Again  the 
uncertainty  represents  two  standard  deviations. 

The  site  of  the  road  was  visited  and  the 
width  found  to  be  18'  11"  with  a tape  measure. 
There  is  a fair  amount  of  uncertainty  connected 
with  this  measurement.  The  edges  of  the  road  are 
characterized  by  vegetation  which  can  overhang  or 
encroach  upon  the  road  by  several  inches  on  either 
side.  Measurements  on  similar  roads  varied  from 
18'  6"  to  19'  10".  Therefore,  the  true  width  of 
the  road  the  day  the  photograph  was  taken  is  not 
known  exact  I /. 

Table  1.  Road  Width  Measurement  Results 


T,lf)  = i 

c w 


(cos 


c 


Method 

Road  Width 

Width  on 

on  Film 

Ground 

VA 

HO.S  s.p.  ♦ 2, 

1 rTTTTTfl 

<16*10"~»  4''> 

Optical 

.0456"  ♦ .0024 

5.79  m ♦ .50 

<19'2"  T 12") 

Tape 

5.76  m a .15 

Measure 

<18*11"“e  6") 

Final ly. 

the  effect  of 

the  cutoff  frequency 

fj  in  equation  (8)  w?' 

tamined.  Line  spread 

functions  for  different  values  of  f^  were  calcu- 
lated and  the  ten  measurements  were  repeated.  The 
results  are  shown  in  Table  2. 

Table  2.  The  Effect  of  f^  on  Width  Estimates 

T width  Variance 

(cycles/mm)  (sample  points) 


56 

110.3 

1.20 

48 

110.2 

1.15 

40 

109,8 

1.1 

32 

109.1 

,96 

Table  2 indicates 

the  width 

estimates  produced  by 

the  VA  are  not 

overly 

sensitive  to  imperfect 

knowledge  of  the  degrading 

system. 
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Fig.  5 Erlargenent  of  a section  of  one 
of  the  roads  In  Fig.  U. 
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Fig.  7 LIim  tpraad  function  of  th«  flln-lant  combination. 


72 


ADAPTIVE  THRESHOLD  FOR  AN  IMAGE  RECOGNITION  SYSTEM 


D.  Serreyn  and  R.  Larson 


HONEYWELL  INC. 
Systems  and  Research 
Minneapolis,  Minnesota  55413 


ABSTRACT 

An  adaptive  object  extraction  algorithm 
(Autothreshold),  using  pixel  classification  and 
Sobel  edges,  has  been  implementated  in  hard- 
ware and  tested  on  recorded  real  time  FLIR 
imagery.  The  results  show  improvement  in 
man-made  object  detection  and  reveal  certain 
problems  with  using  commercially  available 
CCD's. 


AUTOSCREENER 

A major  problem  in  real  time  target 
image  recognition  is  the  large  bandwidth  of  the 
data.  The  information  bandwidth  must  be  re- 
duced by  orders  of  magnitude  before  recogni- 
tion can  be  performed.  The  Autoscreener 
performs  this  bandwidth  reduction  in  two  stages. 
The  first  is  image  segmentation,  that  extracts 
subimages  of  potential  interest.  The  second 
is  to  classify  these  as  either  man  made  objects 
(MMO)  or  clutter.  The  MMO's  are  then  the  low 
bandwidth  input  to  the  recognition  algorithm. 

In  FLIR  imagery,  hot  areas  are  gener- 
ally associated  with  targets.  However,  intensity 
thresholding  produces  poor  image  segmentation. 
Edge  information  improves  the  segmentation. 

The  Autoscreener  extracts  those  parts  of  the 
image  where  the  intensity  exceeds  the  background 
estimate  and  which  are  bounded  by  edges.  The 
AFAL  funded  portion  of  this  contract  develops 
a means  of  adapting  the  thresholds  on  edge  and 
intensity  to  product  segments  that  are  insensitive 
to  changes  in  scene  contrast  and  average  intensity. 

EDGE  EXTRACTION 

Edge  detection  is  done  by  using  a 3 x 3 
Sobel  operator  with  large  values  giving  an  indi- 
cation of  a possible  object  of  interest.  The  Sobel 
edge  IS  implemented  using  commercially  avail- 
able CCD's  for  line  delays  and  for  pixel  delays. 
=~\ock  noise  and  InterScan  line  voltage  levels  are 
’a|Dr  problems  that  were  overcome.  Circuit 
aenui  IS  also  very  critical  in  using  these  devices. 


On  the  positive  side,  these  devices  are  usable  for 
scan  line  delays  of  three  milliseconds  without 
noticeable  signal  degradation. 

The  absolute  value  of  the  horizontal  edge 
component  is  thresholded  to  generate  a logical 
edge  signal.  The  vertical  component  was  not  used 
because  of  banding  in  the  FLIR  data.  (The  band- 
ing is  due  to  improper  balancing  of  the  detector 
at  the  time  the  data  was  taken.  It  is  generally 
not  noticeable  while  observing  several  sequential 
frames  but  is  noticeable  when  a single  frame  is 
used.  ) 

The  edge  threshold  adapting  is  based  upon 
the  scan  line  averages.  The  edge  information  is 
averaged  over  one  line  of  the  image.  At  the  end 
of  the  scan  line  the  average  is  sampled  and  held 
for  the  next  scan  line.  This  sampled  value  is  then 
multiplied  by  a constant  to  provide  the  threshold 
for  the  incoming  edge  information. 

BACKGROUND  ESTIMATOR 

The  Autothreshold  makes  the  Autoscreener 
self  adaptive  to  background  and  contrast  changes. 
The  intensity  adapting  is  done  by  estimating  the 
background  intensity  at  each  pixel  location  while 
the  picture  is  being  scanned. 

Two  background  estimation  methods  have 
been  tested.  One  method  is  controlled  by  the  pixel 
classifier.  This  classifier  uses  intensity  changes 
to  detect  "target  like"  portions  of  the  image  and 
the  image  and  the  background  estimate  is  updated 
only  at  the  "non-target  like"  pixels.  The  other 
method  simply  uses  a two  dimensional  lowpass 
filter  to  estimate  the  background. 

The  bright  information  is  obtained  by  sub- 
tracting the  background  estimate  from  the  video 
intensity.  The  difference  is  thresholded  using 
the  variability  of  adjacent  background  scan  line 
estimates. 
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CONCLUSIONS 


In  addition  to  the  hardware  results,  the 
following  observations  may  be  of  value  to  other 
investigators: 

• Line  by  line  adaption  of  intensity 
and  edge  thresholds  is  superior  to 
determining  threshold  values  from 
fi'ame  averages. 

• Pixel  classification,  based  on  inten- 
sity differences,  produces  an  image 
segmentation  of  similar  quality  to 
that  produced  by  the  combined  edge 
and  bright  signals. 

• Detector  equalization  is  critical  in 
using  the  Sobel  algorithm  on  FLIR 
imagery. 

The  MMO  classifier  is  being  evaluated  at  this 
time  and  these  results  will  be  reported  at  the 
workshop. 
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ABSTRACT 

A number  of  image  analysis  tasks  can  benefit 
from  registration  of  the  image  with  a model  of  the 
surface  being  imaged.  Automatic  navigation  using 
visible  light  or  radar  images  requires  exact  align- 
ment of  such  images  with  digital  terrain  models.  In 
addition,  automatic  classification  of  terrain,  using 
satellite  imagery,  requires  such  alignment  to  deal 
correctly  with  the  effects  of  varying  sun  angle  and 
surface  slope.  Even  inspection  techniques  for  cer- 
tain industrial  parts  may  be  improved  by  this  means. 

We  achieve  the  required  alignment  by  matching 
the  real  image  with  the  synthetic  image  obtained 
from  a surface  model  and  known  positions  of  the 
light  sources.  The  synthetic  image  intensity  is 
calculated  using  the  reflectance  map,  a convenient 
way  of  describing  the  surface  reflection  as  a func- 
tion of  surface  gradient.  We  illustrate  the  tech- 
nique using  LANDSAT  images  and  digital  terrain  mod- 
els. 


MOTIVATION 

Interesting  and  useful  new  image  analysis  meth- 
ods may  be  developed  if  registered  image  intensity 
and  surface  slope  information  is  available.  Auto- 
matic change  detection,  for  example,  seems  unattain- 
able without  an  ability  to  deal  with  variations  of 
appearance  with  changes  in  the  sun's  position.  In 
turn,  these  variations  can  be  understood  only  in 
terms  of  surface  topography  and  reflectance  models. 
Similarly,  human  cartographers  consult  both  aerial 
photographs  and  topographic  maps  of  a region  to  es- 
tablish the  location  of  streamlines.  Automatic 
analysis  of  either  of  these  information  sources 
alone  is  unlikely  to  lead  to  robust  methods  for  per- 
forming this  task. 

Accurate  alignment  of  images  with  surface  models 
is  therefore  an  important  prerequisite  for  many  image 
understanding  tasks.  We  describe  here  an  automatic 
method  of  potentially  high  accuracy  that  does  not 
depend  on  feature  extraction  or  other  sophisticated 
image  analysis  methods.  Instead,  all  that  is  re- 
quired is  careful  matching  of  the  real  with  a syn- 
thetic image.  Because  this  is  an  area-based  pro- 
cess, it  has  the  potential  for  sub-pixel  accuracy  -- 
accuracy  not  attainable  with  techniques  dependent 
on  alignment  of  linear  features  such  as  edges  or 
curves.  The  method  is  illustrated  by  registering 


LANDSAT  images  with  digital  terrain  models. 

POSSIBLE  APPROACHES 

One  way  to  align  a real  image  with  a surface 
model  might  be  through  the  use  of  a reference  im- 
age obtained  under  controlled  conditions.  New 
images  could  then  be  matched  against  the  reference 
image  to  achieve  alignment.  Unfortunately,  the 
appearance  of  a surface  depends  quite  dramatically 
on  the  position  of  the  light  source  (as  seen  in 
figure  1,  for  example),  so  that  this  method  works 
only  for  a limited  daily  interval  for  a limited 
number  of  days  each  year  [1],  This  problem  disap- 
pears when  one  uses  synthetic  images,  since  the 
position  of  the  source  can  be  taken  into  account. 

A more  sophisticated  process  would  not  match 
images  directly,  but  first  perform  a feature  ex- 
traction process  on  the  real  image  and  then  match 
these  features  with  those  found  in  the  reference 
image.  One  finds,  however,  that  different  features 
will  be  seen  when  lighting  changes:  for  example, 
ridges  and  valleys  parallel  to  the  illumination 
direction  tend  to  disappear  (see  figure  1 again). 

In  addition,  the  apparent  position  of  a feature  as 
well  as  its  shape  may  depend  somewhat  on  illumina- 
tion. More  serious  may  be  the  present  feature  ex- 
traction schemes'  computational  cost  and  lack  of 
robustness.  Finally,  we  should  note  that  the  ac- 
curacy obtained  by  matching  linear  features  is 
likely  to  be  lower  than  that  obtainable  with  a 
method  based  on  an  aerial  match. 

One  might  consider  calculating  the  shape  of 
the  surface  from  intensities  in  the  image  [2]. 

This,  however,  is  computationally  expensive  and  not 
likely  to  be  very  accurate  in  view  of  the  variation 
in  the  nature  of  surface  cover.  A more  accurate 
method,  estimating  the  local  gradient  using  similar 
methods  [3]  and  then  matching  these  with  gradients 
stored  in  the  terrain  model,  still  Involves  a great 
deal  of  computation. 

The  method  chosen  here  depends  instead  on 
matching  the  real  image  with  a synthetic  Image  pro- 
duced from  the  terrain  model.  The  similarity  of 
the  two  Images  depends  In  part  upon  how  closely  the 
assumed  reflectance  matches  the  real  one.  For 
mountainous  terrain  and  for  Images  taken  with  low 
sun  elevations,  rather  simple  assumptions  about  the 
reflectance  properties  of  the  surface  gave  very 
good  results.  Since  all  LANDSAT  Images  are  taken 
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at  about  9:30  local  solar  time,  the  sun  elevations 
In  this  case  are  fairly  small  and  Image  registration 
for  all  but  flat  terrain  Is  straightforward. 

This  Implies  that  LANOSAT  Images  are  actually 
not  optimal  for  automatic  terrain  classification, 
since  the  Intensity  fluctuations  due  to  varying  sur- 
face gradients  often  swamp  the  intensity  fluctua- 
tions due  to  variations  in  surface  cover.  An  im- 
portant application  of  our  technique  in  fact  is  the 
removal  of  the  intensity  fluctuations  due  to  varia- 
tions in  surface  gradient  from  satellite  images  in 
order  to  facilitate  the  automatic  classification 
of  terrain.  To  do  this,  we  must  model  the  way  the 
surface  reflects  light. 

THE  REFLECTANCE  MAP 

Work  on  Image  understanding  has  led  to  a need 
to  model  the  image- formation  process.  One  aspect 
of  this  concerns  the  geometry  of  projection,  that 
is,  the  relationship  between  the  position  of  a point 
and  the  coordinates  of  its  image.  Less  well  under- 
stood is  the  problem  of  determining  image  Intensi- 
ties, which  requires  modelling  of  the  way  surfaces 
reflect  light.  For  a particular  kind  of  surface 
and  a particular  placement  of  light  sources,  surface 
reflectance  can  be  plotted  as  a function  of  surface 
gradient  (magnitude  and  direction  of  slope).  The 
result  is  called  a reflectance  map  and  is  usually 
presented  as  a contour  map  of  constant  reflectance 
in  gradient  space  [3]. 

One  use  of  the  reflectance  map  Is  in  the  deter- 
mination of  surface  shape  from  intensities  [2]  in  a 
single  Image;  here,  however.  It  will  be  employed 
only  in  order  to  generate  synthetic  images  from 
digital  terrain  models. 

DIGITAL  TERRAIN  MODELS 

Work  on  computer-based  methods  for  cartography, 
prediction  of  side-looking  radar  imagery  for  flight- 
simulators,  automatic  hill-shading  and  machines  that 
analyze  stereo  aerial  photography  has  led  to  the 
development  of  digital  terrain  models.  These  models 
are  usually  In  the  form  of  an  array  of  terrain  ele- 
vations, zij,  on  a square  grid. 

Data  used  for  this  paper's  illustrations  was 
entered  Into  a computer  after  manual  interpolation 
from  a contour  map  and  has  been  used  previously  In 
work  on  automatic  hill-shading  [4,5].  It  consists 
of  an  array  of  175-24D  elevations  on  a IDO-meter 
grid  corresponding  to  a 17.5  km  by  24  km  region  of 
Switzerland  lying  between  7°r  East  to  f^lE'  East 
and  46°8.5'  North  to  46°21.5'  North.  The  vertical 
quantization  is  ID  meters,  and  elevations  range  from 
41D  meters  (In  the  Rhone  valley)  to  321D  meters  (on 
the  Sonnet  des  Diablerets).  The  topographic  maps 
used  In  the  generation  of  the  data  are  "Les  Olabler- 
ets"  (No.  1285)  and  "Dent  de  Morcles"  (No.  13D5), 
both  on  a 1:25  DDD  scale  [6].  Extensive  data  edit- 
ing was  necessary  to  remove  entry  errors;  some  minor 
distortions  of  elevations  may  have  resulted. 

Manually-entered  models  of  two  regions  In  Canada 
have  also  been  used  [5,7].  Another  set,  covering  a 


region  of  California,  was  produced  by  a digital 
simulator  of  a proposed  automatic  stereo  scanner. 
(Dutput  of  two  experimental  automatic  stereo  scan- 
ners, one  built  at  ETL  [8]  and  one  built  at  RADC 
[9],  could  not  be  obtained). 

The  United  States  Geological  Survey  [ID]  sup- 
plies digital  terrain  models  on  magnetic  tape,  each 
covering  one  square  degree  of  the  United  States, 
with  a grid  spacing  of  about  2D8  feet  (63.5  m). 
These  models  apparently  were  produced  by  Interpola- 
tion from  hand-traced  contours  on  existing  topo- 
graphic maps  of  the  1:25D  DDD  series.  Interpola- 
tion to  a resolution  of  .D1  inch  (D.254  mm)  on  the 
original  maps  fills  in  elevations  between  the  con- 
tours spaced  2DD  feet  (6D.96  m)  vertically.  The 
final  result  is  smoothed  and  "generalized"  to  a 
considerable  extent;  nevertheless,  this  is  the  most 
prolific  source  of  surface  models  available  to  the 
public. 

THE  GRADIENT 

A gradient  has  two  components,  namely  the  sur- 
face slope  along  two  mutually  perpendicular  direc- 
tions. If  the  surface  height,  z.  Is  expressed  as 
a function  of  two  coordinates  x and  y,  we  define 
the  two  components,  p and  q,  of  the  gradient  as  the 
partial  derivatives  of  z with  respect  to  x and  y 
respectively.  In  particular,  a Cartesian  coordin- 
ate system  is  erected  with  the  x-axis  pointing  east, 
the  y-axis  north  and  the  z-axis  up.  Then,  p is 
the  slope  of  the  surface  in  the  west-to-east  direc- 
tion, while  q is  the  slope  in  the  south-to-north 
direction: 


Dne  can  estimate  the  gradient  from  the  digital  ter- 
rain model  using  first  differences, 

P = f^(i  * l)j  ■ "lj^^" 

' ^"i(j  + 1)  * 

where  a Is  the  grid-spacing.  More  sophisticated 
schemes  are  possible  [5]  for  estimating  the  surface 
gradient,  but  are  unnecessary. 

PDSITIDN  DF  THE  LIGHT  SDURCES 

In  order  to  be  able  to  calculate  the  reflec- 
tance map.  It  Is  necessary  to  know  the  location  of 
the  light  source.  In  our  case  the  primary  source 
is  the  sun,  and  Its  location  can  be  determined 
easily  by  using  tables  Intended  for  celestial  navi- 
gation [11,  12,  13]  or  by  straightforward  computa- 
tions [14,  15,  16,  17].  In  either  case,  given  the 
data  and  time,  the  azimuth  (e)  and  the  elevation 
it)  of  the  sun  can  be  found.  Here,  azimuth  is 
measured  clockwise  from  North,  while  elevation  is 
simply  the  angle  between  the  sun  and  horizon  (see 
figure  2).  Now  one  can  erect  a unit  vector  at  the 
origin  of  the  coordinate  system  pointing  at  the 
light  source. 
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Oj  * [sin(0)  cos(it),  cos(e)  cos(4>),  sin(()i)]. 

Since  a surface  element  with  gradient  (p.q)  has  a 
normal  vector  q = (-p,  -q,  1),  we  can  identify  a 
particular  surface  element  that  happens  to  be  per- 
pendicular to  the  direction  towards  the  light  source. 
Such  a surface  element  will  have  a surface  normal 
Os  ■ (-Ps.  -qs»  1)  where  pj  = sin(e)  cot(*)  and 
qs  = cos(6)  cotCa).  We  can  use  the  gradient 
(Ps>qs)  an  alternate  means  of  specifying  the 
position  of  the  source. 

In  work  on  automatic  hill-shading,  for  example, 
one  uses  Ps  = -0.707  and  qj  = 0.707  to  agree  with 
standard  cartographic  conventions  which  require  that 
the  light  source  be  in  the  North-west  at  45°  eleva- 
tion (e  * 7/4)  , ♦ = it/4  [5]. 


REFLECTANCE  AS  A FUNCTION  OF  THE  GRADIENT 


Reflectance  of  a surface  can  be  expressed  as  a 
function  of  the  incident  angle  (i),  the  emittance 
angle  (e)  and  the  phase  angle  (g)  (see  figure  3). 

We  use  a simple,  idealized  reflectance  model  for  the 
surface  material , 

♦^(i.  e,  g)  = p cos(i) 

This  reflectance  function  models  a surface  which,  as 
a perfect  diffuser,  appears  equally  bright  from  all 
viewing  directions.  Here,  o is  an  "albedo"  factor 
and  the  cosine  of  the  incident  angle  simply  accounts 
for  the  foreshortening  of  the  surface  element  as 
seen  from  the  source.  More  sophisticated  models  of 
surface  reflectance  are  possible  [3],  but  are  unnec- 
essary for  this  application. 

The  incident  angle  is  the  angle  between  the 
local  normal  (-p,  -q,  1)  and  the  direction  to  the 
light  source  (-Pj,  -qc,  1).  The  cosine  of  this  ang- 
le can  then  be  found  by  taking  the  dot-product  of 
the  corresponding  unit  vectors. 


cos(i)  = 


(1  + PjP  + q^q) 


+ p^  + q^ 

^S  ^s 


+ p2  + q2 


Another  reflectance  function,  similar  to  that 
of  materials  in  the  maria  of  the  moon  and  rocky 
planets  [2,  18],  is  a little  easier  to  calculate; 


♦2(P.<l)  * p cos(i)/cos(e) 


p(l  + p^p  + q^q) 

/l  + p2  + q7 
•^s  ^s 


This  reflectance  function  models  a surface  which  re- 
flects equal  amounts  of  light  in  all  directions. 

For  small  slopes  and  low  sun  elevations,  it  is  very 
much  like  the  first  one,  since  then  (1  ♦ p*  ♦ q^) 
will  be  near  unity.  Both  functions  were  tried  and 


both  produce  good  alignment  --  in  fact,  it  is 
difficult  to  distinguish  synthetic  images  pro- 
duced using  these  two  reflectance  functions. 


SYNTHETIC  IMAGES 

Given  the  projection  equations  that  relate 
points  on  the  objects  to  images  of  said  points, 
and  given  a terrain  model  allowing  calculation  of 
surface  gradient,  it  is  possible  to  predict  how 
an  image  would  look  under  given  illuminating  con- 
ditions, provided  the  reflectance  map  is  availa- 
ble. We  assume  simple  orthographic  projection 
here  as  appropriate  for  a distant  spacecraft  look- 
vertical  ly  down  with  a narrow  angle  of  view.  Per- 
spective projection  would  require  a few  minor 
changes  in  the  algorithm. 

The  process  of  producing  the  synthetic  image 
is  simple.  An  estimate  of  the  gradient  is  made 
for  each  point  in  the  digital  terrain  model  by 
considering  neighboring  elevations.  The  gradient's 
components,  p and  q,  are  then  used  to  look  up  or 
calculate  the  expected  reflectance.  An  appropri- 
ate intensity  is  placed  in  the  image  at  the  point 
determined  by  the  projection  equation.  All  compu- 
tations are  simple  and  local,  and  the  work  grows 
linearly  with  the  number  of  picture  cells  in  the 
synthetic  image. 

Sample  synthetic  images  are  shown  in  figure  1. 
The  two  images  are  of  the  same  region  with  differ- 
ences in  assjned  location  of  the  light  source. 

In  figure  2a  the  sun  is  at  an  elevation  of  34°  and 
azimuth  of  153°,  corresponding  to  its  true  posi- 
tion at  9:52  G.M.T.,  1972/0ct/9,  while  for  figure 
2b  it  was  at  an  elevation  of  28°  and  an  azimuth 
of  223°,  corresponding  to  its  position  at  13:48 
G.M.T.  later  on  the  same  day.  The  corresponding 
reflectance  maps  are  shown  in  figure  4. 

Reflectance  maps  for  the  simpler  reflectance 
function  42(P'9)  under  the  same  circumstances  are 
shown  in  figure  5.  Note  that  near  the  origin 
there  is  very  little  difference  between  *i(p,q) 
and  Since  most  surface  elements  in  this 

terrain  model  have  slopes  less  than  l/i^,  as 
shown  in  the  scattergram  (see  figure  6),  synthetic 
images  produced  using  these  two  reflectance  maps 
are  similar. 

Since  the  elevation  data  is  typically  rather 
coarsely  quantized  as  a result  of  the  fixed  con- 
tour intervals  on  the  base  map,  p and  q usually 
take  on  only  a few  discrete  values.  In  this  case, 
it  is  convenient  to  establish  a lookup  table  for 
the  reflectance  map  by  simply  precalculating  the 
reflectance  for  these  values.  Models  with  arbi- 
trarily complex  reflectance  functions  can  then  be 
easily  accomodated  as  can  reflectance  functions 
determined  experimentally  and  known  only  for  a 
discrete  set  of  surface  orientations. 

Since  the  real  image  was  somewhat  smoothed  in 
the  process  of  being  reproduced  and  digitized. 
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we  found  It  advantageous  to  perform  a similar 
smoothing  operation  of  the  synthetic  Images  so  that 
the  resolution  of  the  two  approximately  matched. 
Alignment  of  real  and  synthetic  Images  was,  however, 
not  dependent  on  this  refinement. 


THE  REAL  IMAGE 


where  ax  and  ay  are  the  shifts  In  x‘  and  y‘  respec- 
tively, e Is  the  angle  of  rotation  and  s Is  the 
scale  factor.  Rotation  and  scaling  take  place  rel- 
ative to  the  centers  (xQ,yq)  and  (Xn,yQ)  of  the  two 
Images  In  order  to  better  decouple  the  effects  of 
rotation  and  scaling  from  translations.  That  Is, 
the  average  shift  In  x'  and  y'  Induced  by  a change 
In  rotation  angle  or  scale  Is  zero. 


The  Image  used  for  this  paper's  Illustrations 
Is  a portion  of  a LANDSAT  [1,  ig]  Image  acquired  a- 
bout  9:52  G.M.T.  1972/0ct/9  (ERTS-1  1078-09555). 

We  used  channel  6 (near  Infra-red,  .Iv  to  .Su),  al- 
though all  four  channels  (4,  5,  6,  S 7)  appeared 
suitable  --  with  channel  4 (green,  .5iJ  to  .Su)  be- 
ing most  sensitive  to  water  in  the  air  column  be- 
tween the  satellite  and  the  ground,  and  channel  7 
best  at  penetrating  even  thin  layers  of  clouds  and 
snow.  Figure  7 compares  an  enlargement  of  the  orig- 
inal transparency  with  the  synthetic  image  generated 
from  the  terrain  model. 

A slow-scan  vidicon  camera  (Spatial  Data  Sys- 
tems 108)  was  used  to  digitize  the  positive  trans- 
parency of  1:1  000  000  scale.  Individual  picture 
cells  were  about  .1  mm  on  a side  In  order  to  match 
roughly  the  resolution  of  the  synthetic  Image  data. 
In  recent  work,  we  used  a more  accurate  version 
digitized  on  a drum-scanner  (Optronics  Photoscan 
1000),  again  with  a . 1 mm  resolution  on  the  film. 
Note  that  the  "footprint"  of  a LANDSAT  picture  cell 
Is  about  79  x 79  meters  [1],  compatible  with  the 
resolution  of  typical  digital  terrain  models.  The 
digitized  image  used  for  the  illustrations  in  this 
paper  is  of  lower  resolution,  however,  due  to  limit- 
ations of  the  optics  and  electron-optics  of  the 
digitizing  system.  In  future  studies  we  intend  to 
use  the  computer-compatible  tapes  supplied  by  EROS 
[19]. 

Alignment  of  real  images  with  terrain  models  is 
possible  even  with  low  quality  image  data,  but  ter- 
rain classification  using  the  aligned  image  and 
digital  surface  model  requires  high  quality  data. 

We  generated  image  output,  as  for  figures  la, 
lb,  7a,  and  11,  on  a drum  film-writer  (Optronics 
Photowriter  1500)  and  interpolated  to  alleviate  un- 
desirable raster  effects  due  to  the  relatively  small 
number  of  picture  cells  in  each  image. 

TRANSFORMATION  PARAMETERS 

Before  we  can  match  the  synthetic  and  the  real 
image,  we  must  determine  the  nature  of  the  trans- 
formation between  them.  If  the  real  image  truly  Is 
an  orthographic  projection  obtained  by  looking 
straight  down,  it  is  possible  tc  describe  this  trans- 
formation as  a combination  of  a translation,  a ro- 
tation and  a scale  change.  If  we  use  x and  y to 
designate  points  in  the  synthetic  image  and  x'  and 
y'  for  points  in  the  real  image,  we  may  write: 


x'  - x; 

cose  sine 

* ■ "o 

AX 

= S 

+ 

.y'  ■ K. 

-sine  cose 

.y  - >0 . 

. . 

In  our  case,  the  available  terrain  model  re- 
stricts In  size  the  synthetic  Image.  The  area 
over  which  matching  of  the  two  will  be  performed 
Is  thus  always  fixed  by  the  border  of  the  synthetic 
Image.  The  geometry  of  the  coordinate  transforma- 
tion Is  illustrated  in  figure  8. 

CHOICE  OF  SIMILARITY  MEASURE 

In  order  to  determine  the  best  set  of  trans- 
formation parameters  (ax,  ay,  s,  e),  one  must  be 
able  to  measure  how  closely  the  images  match  for 
a particular  choice  of  parameter  values.  Let  S^j 
be  the  intensity  of  the  synthetic  image  at  the 
it^  picture  cell  across  in  the  row  from  the 
bottom  of  the  image,  and  define  R-jj  similarly  for 
the  real  image.  Because  of  the  nature  of  the 
coordinate  transformation,  we  cannot  expect  that 
the  point  in  the  real  image  corresponding  to  the 
point  (i,j)  in  the  synthetic  image  will  fall  pre- 
cisely on  one  of  the  picture  cells.  Consequently, 
S^j  will  have  to  be  compared  with  R(x',y'),  which 
is'^interpolated  from  the  array  of  real  Image  in- 
tensities. Here  (x',y')  Is  obtained  from  (i,j)  by 
the  transformation  described  in  the  previous  sec- 
tion. 

One  measure  of  difference  between  the  two 
Images  may  be  obtained  by  summing  the  absolute 
values  of  differences  over  the  whole  array.  Al- 
ternately, one  might  sum  the  squares  of  the  dif- 
ferences: 

n m 

I I {S.j  - R(x',y'))2 

i = 1 j = 1 

This  measure  will  be  minimal  for  exact  alignment 
of  the  images.  Expanding  the  square,  one  decom- 
poses this  resulj  into  three  terms,  the  first  be- 
ing the  sum  of  S^j,  the  last  the  sum  of  R^(x',y’). 
The  first  is  constant,  since  we  always  use  the  full 
synthetic  image;  the  last  varies  slowly  as  differ- 
ent regions  of  the  real  image  are  covered.  The 
sum  of  S^jR(x',y')  is  interesting  since  this  term 
varies  most  rapidly  with  changes  In  the  transforma- 
tion. In  fact,  a very  useful  measure  of  the  simi- 
larity of  the  two  Images  Is  the  correlation: 

n m 

r z S..R(x',y'). 

i * 1 j = 1 

This  measure  will  be  maximal  when  the  images  are 
properly  aligned.  It  has  the  advantage  of  being 
relatively  insensitive  to  constant  multiplying 
factors.  These  may  arise  in  the  real  image  due  to 
changes  in  the  adjustment  of  the  optical  or  elec- 
tronic systems. 
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Note  that  Image  intensity  is  the  product  of  a con' 
stant  factor  which  depends  on  the  details  of  the 
Imaging  system  (such  as  the  lens  opening  and  the 
focal  length),  the  Intensity  of  the  Illumination 
striking  the  surface,  and  the  reflectance  of  the 
surface.  We  assume  all  but  the  last  factor  Is  con- 
stant  and  thus  speak  Interchangeably  of  changes  In 
surface  reflectance  and  changes  In  Image  Intensi- 
ties. 


INTERPOLATION  SCHEME 

The  real  Image  Intensity  at  the  point  (x',y') 
has  to  be  estimated  from  the  array  of  known  image 
intensities.  If  we  let  k • Ix'j,  and  t » Ly'J  be 
the  integer  parts  of  x'  and  y',  then  R(x',y')  can 
be  estimated  from  R|jn,  R{k+i)f.  RkCitl) 

R(k+i)(i+i)  by  linear  Interpolation  (see  figure  9). 

Rj(x')  - (k+l-x')R^j  + 

R(j+l)(x')  • + (*'-'‘)«(k+1)(t+l) 

R(x',y')  * (t+l-y')Rj{x')+(y'-t)Rj^^^j(x') 

The  answer  Is  Independent  of  the  order  of  Interpo- 
lation and.  In  fact,  corresponds  to  the  result  ob- 
tained by  fitting  a polynomial  of  the  form 
(a  + bx'  + xy'  + dx'y')  to  the  values  at  the  four 
Indicated  points.  Alignment  Is  not  Impaired,  how- 
ever, when  nearest  neighbor  Interpolation  Is  used 
Instead.  This  may  be  a result  of  the  smoothing  of 
the  real  Image  as  previously  described. 

CHOICE  OF  NORMALIZATION  METHOD 

High  output  may  result  as  the  transformation 
Is  changed  simply  because  the  region  of  the  real 
Image  used  happens  to  have  a high  average  gray-lev- 
el. Spurious  background  slopes  and  false  maxima 
may  then  result  if  the  raw  correlation  Is  used.  For 
this  and  other  reasons.  It  Is  convenient  to  normal- 
ize. One  approach  essentially  amounts  to  dividing 
each  of  the  two  Images  by  Its  standard  deviation; 
alternately,  one  can  divide  the  raw  correlation  by 


An  additional  advantage  Is  that  a perfect  match  of 
the  two  Images  now  corresponds  to  a normalized 
correlation  of  one.  An  alternate  method  uses  a 
normalization  factor  that  Is  slightly  easier  to 
compute  and  which  has  certain  advantages  If  the 
standard  deviations  of  the  two  Images  are  similar. 
Instead  of  using  the  geometric  mean,  Hans  Moravec 
proposes  the  arithmetic  mean  [20]. 
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The  first  term  need  not  be  recomputed,  since  the 
full  synthetic  Image  Is  always  used.  Since  we 
found  the  alignment  procedure  Insensitive  to  the 
choice  of  normalization  method,  we  used  the  second 
In  our  Illustrations. 


LOCATING  THE  BEST  MATCH 

Now  that  v/e  have  shown  how  to  calculate  a 
good  similarity  measure,  we  must  find  a method  to 
find  efficiently  the  best  possible  transformation 
parameters.  Exhaustive  search  is  clearly  out  of 
the  question.  Fortunately,  the  similarity  measure 
allows  the  use  of  standard  hill-climbing  techniques. 
This  is  because  It  tends  to  vary  smoothly  with 
changes  In  parameters  and  often  Is  monotonic  (at 
least  for  small  ranges  of  the  parameters). 

When  Images  are  not  seriously  misaligned,  pro- 
files of  the  similarity  measure  usually  are  uni- 
modal  with  a well-defined  peak  when  plotted  against 
one  of  the  four  parameters  of  the  transformation 
(see  figure  10).  It  Is  possible  to  optimize  each 
parameter  In  turn,  using  simple  search  techniques 
in  one  dimension.  The  process  can  then  be  Iter- 
ated. A few  passes  of  this  process  typically  pro- 
duce convergence.  (More  sophisticated  schemes 
could  reduce  the  amount  of  computation,  but  were 
not  explored). 

When  the  Images  are  Initially  not  reasonably 
aligned,  more  care  has  to  be  taken  to  avoid  being 
trapped  by  local  maxima.  Solving  this  problem  us- 
ing more  extensive  search  leads  to  prohibitively 
lengthy  computations.  We  need  a way  of  reducing 
the  cost  of  comparing  Images. 

' USING  REDUCED  IMAGES 

One  way  to  reduce  the  computation  Is  to  use 
only  sub-images  or  "windows"  extracted  from  the 
original  Images.  This  is  useful  for  fine  matching, 
but  Is  not  satisfactory  here  because  of  the  lack 
of  global  context. 

Alternately,  one  might  use  sampled  Images  ob- 
tained by  picking  one  Image  intensity  to  represent 
a small  block  of  Image  Intensities.  This  Is  satis- 
factory as  long  as  the  original  images  are  smoothed 
and  do  not  have  any  high  resolution  features.  If 
this  Is  not  the  case,  aliasing  due  to  under-samp- 
ling will  produce  images  of  poor  quality  unsuitable 
for  comparisons. 

One  solution  to  this  dilemma  is  to  low-pass 
filter  the  images  before  sampling.  A simple  ap- 
proximation to  this  process  uses  averages  of  small 
blocks  of  image  intensities.  The  easiest  method 
involves  making  one  image  intensity  in  the  reduced 
image  equal  to  the  average  of  a 2 x 2 block  of  in- 
tensities in  the  original  image.  This  technique 
can  be  applied  repeatedly  to  produce  ever  smaller 
images  and  has  been  used  in  a number  of  other  ap- 
plications [20,21]. 

The  results  of  the  application  of  this  reduc- 
tion process  to  real  and  synthetic  images  can  be 
seen  in  figure  11.  First,  the  most  highly  reduced 
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Image  Is  used  to  get  coarse  alignment.  In  this 
case  extensive  search  In  the  parameter  space  1s 
permissible,  since  the  number  of  picture  cells  In 
the  Image  to  be  matched  Is  very  small.  This  coarse 
alignment  Is  then  refined  using  the  next  larger  re- 
duced Images  (with  four  times  as  many  picture 
cells).  Finally,  the  full  resolution  Images  are 
used  directly  to  fine  tune  the  alignment.  False 
local  maxima  are,  fortunately,  much  rarer  with  the 
highly  reduced  pictures,  thus  further  speeding  the 
search  process.  It  Is  as  If  the  high  resolution 
features  are  the  ones  leading  to  false  local  maxi- 
ma. 

We  found  it  best,  by  the  way,  to  determine 
good  values  for  the  translations  first,  then  ro- 
tation and,  finally,  scale  change.  Naturally 
when  searching  for  a peak  value  as  a function  of 
one  parameter,  the  best  values  found  so  far  for  the 
other  parameters  are  used. 

RESULTS  OF  REGISTRATION  EXPERIMENTS 

We  matched  the  real  and  synthetic  images  using 
the  similarity  measure  and  search  technique  just 
described.  We  tried  several  combinations  of  imple- 
mentation details,  and  in  all  cases  achieved  align- 
ment which  corresponded  to  a very  high  value  of  the 
normalized  correlation,  very  close  to  that  deter- 
mined manually.  For  the  Images  shown  here,  the 
normalized  correlation  coefficient  reaches  .92  for 
optimum  alignment,  and  the  match  is  such  that  no 
features  are  more  than  two  picture  cells  from  the 
expected  place,  with  almost  all  closer  than  one. 

(The  major  errors  in  position  appear  to  be  due  to 
perspective  distortion,  as  described  later,  with 
which  the  process  is  not  designed  to  cope).  The 
accuracy  with  which  translation,  rotation  and 
scaling  were  determined  can  be  estimated  from  the 
above  statement. 

Overall,  the  process  appears  quite  successful, 
even  with  degraded  data  and  over  a wide  range  of 
choices  of  implementation  details.  Details  of  in- 
terpolation, normalization,  search  technique,  and 
even  the  reflectance  map  do  not  matter  a great  deal. 

Having  stated  that  alignment  can  be  accurately 
achieved,  we  may  now  ask  how  similar  the  real  and 
synthetic  images  are.  There  are  a number  of  unin- 
formative numerical  ways  of  answering  this  question. 
Graphic  illustrations,  such  as  images  of  the  differ- 
ences between  the  real  and  synthetic  image,  are 
more  easily  understood.  For  example,  we  plot  real 
image  Intensity  versus  synthetic  image  intensity  in 
figure  12.  Although  one  might  expect  a straight 
line  of  slope  one,  the  scattergram  shows  clusters 
of  points,  some  near  the  expected  line,  some  not. 

The  cluster  of  points  indicated  by  the  arrow 
labelled  A (figure  12)  corresponds  chiefly  to  image 
points  showing  cloud  or  snow  cover,  with  intensity 
sufficient  to  saturate  the  image  digitizer.  Here 
the  real  image  intensity  exceeds  the  synthetic  image 
intensity.  Arrow  B indicates  the  cluster  of  points 
which  corresponds  to  shadowed  points.  Those  near 
the  vertical  axis  and  to  Its  left  come  from  self- 
shadowed  points.  Those  near  the  vertical  axis  and 
to  its  left  come  from  self-shadowed  surface  elements. 


while  those  to  the  right  are  regions  lying  Inside 
shadows  cast  by  other  portions  of  the  surface. 

These  cast  shadows  are  not  simulated  In  the  syn- 
thetic Image  at  the  moment.  Here  the  synthetic 
Image  Is  brighter  than  the  real  image.  Finally, 
the  cluster  of  points  indicated  by  arrow  C arises 
from  the  valley  floor,  which  covers  a fairly  large 
area  and  has  essentially  zero  gradient.  As  a re- 
sult, the  synthetic  Image  has  constant  Intensity 
here,  while  the  real  image  shows  both  darker  fea- 
tures (such  as  the  river)  and  brighter  ones  (such 
as  those  due  to  the  cities  and  vegetation  cover). 
Most  of  the  ground  cover  1r.  the  valley  appears  to 
have  higher  "albedo"  than  the  bare  rock  which  is 
exposed  in  the  higher  regions,  as  suggested  by  the 
position  of  this  cluster  above  the  line  of  slope 
one. 

If  one  were  to  remove  these  three  clusters  of 
points,  the  remainder  would  form  one  elongated 
cluster  with  major  axis  at  about  45°.  This  shows 
that,  while  there  may  not  be  an  accurate  point-by- 
point  equality  of  intensities,  there  Is  a high 
correlation  between  intensities  in  the  real  and 
synthetic  images. 

Note,  by  the  way,  that  no  quantization  of  in- 
tensity is  apparent  in  these  scattergrams . This 
is  a result  of  the  smoothing  applied  to  the  syn- 
thetic image  and  the  interpolation  used  on  the  real 
image.  Without  smoothing,  the  synthetic  image  has 
fairly  coarse  quantization  levels  because  of  the 
coarse  quantization  of  elevations  as  indicated 
earlier.  Without  interpolation,  the  real  image, 
too,  has  fairly  coarse  quantization  due  to  the 
image  digitization  procedure. 

Finally,  note  that  we  achieve  our  goal  of  ob- 
taining accurate  alignment.  Detailed  matching  of 
synthetic  and  real  image  intensity  is  a new  prob- 
lem which  can  be  approached  now  that  the  problem 
of  image  registration  has  been  solved. 

REASDNS  FDR  REMAINING  INTENSITY  MISMATCHES 

We  may  need  more  accurate  prediction  of  image 
intensities  for  some  applications  of  aligned  image 
intensity  and  surface  gradient  information.  Thus, 
it  is  useful  to  analyze  the  reasons  for  the  dif- 
ferences noted  between  the  synthetic  and  the  real 
image; 

Satellite  Imaging.  Geometric  distortion  in 
satellite  imagery  may  be  small  but  noticeable 
and  traceable  to  several  sources  [1].  Shifts 
of  several  hundred  meters  can  arise.  Perspec- 
tive distortion  for  the  image  used  here  amounted 
to  about  2DD  meters  on  the  highest  peaks,  for 
example. 

Intensity  distortions  are  caused  by  the  fact 
that  scan  lines  are  not  all  sensed  by  the  same 
sensor  [1].  Electronic  noise  and  atmospheric 
attenuation,  dispersion  and  scattering  are  also 
important  for  some  of  the  spectral  bands. 

Digitization.  Wnen  film  transparencies  are 
digitized,  the  resolution  limitations  of  the 
optics  and  the  nonlinear  response  of  the  film 
are  Important.  More  large  errors  are  introduced 
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if  an  electron-optic  device  is  used.  These 
typically  introduce  geometric  distortions,  non- 
linearity and  nonuniformity  of  response.  Pic- 
ture cells  may  not  be  square  and  axes  not  per- 
pendicular. 

Terrain  Model.  Inaccuracies  due  to  manual  entry 
and  editing  are  conrion  in  present  day  digital 
terrain  models.  In  addition,  the  contour  maps 
used  coimionly  as  source  information  are  already 
liberally  "generalized"  and  smoothed  by  the 
cartographer.  Finally,  the  estimation  of  sur- 
face gradient  is  likely  to  be  crude,  since  the 
data  in  such  maps  is  intended  to  be  accurate 
in  elevation,  not  in  the  partial  derivatives  of 
elevation.  Coarse  quantization  of  the  gradient 
is  one  effect  of  this  that  has  already  been  men- 
tioned. We  hope  that  terrain  models  produced 
by  automatic  stereo  comparators  in  the  future 
will  not  suffer  from  all  of  these  shortcomings. 

Reflectance.  The  assumption  of  uniform  reflec- 
tance and  the  modelling  of  reflectance  by  means 
of  the  simple,  rather  ad  hoc  functions  used  here 
contribute  errors  to  the  synthetic  image.  More 
seriously,  cast  shadows  are  not  modelled.  Illum- 
ination from  the  sky  and  mutual  illumination  be- 
tween mountain  slopes  are  less  important.  In- 
cluding even  crude  surface  cover  information  im- 
proves the  match  between  the  synthetic  and  the 
real  image. 

Water.  In  its  various  forms,  water  can  produce 
large  mismatches  since,  at  least  for  the  shorter 
wavelengths,  moisture  in  the  atmosphere  contri- 
butes to  attenuation  and  scattering  of  light. 

In  liquid  form,  water  produces  bright,  obscuring 
areas  in  the  form  of  clouds  and  dark  regions 
such  as  rivers  and  lakes.  Snow  and  ice  provide 
highly  reflective  areas  which  produce  large  mis- 
matches. 

In  view  of  all  these  factors,  it  is  surprising  that 
a match  as  good  as  that  in  figure  12  is  possible. 

FURTHER  IMPROVEMENT  OF  THE  SYNTHETIC  IMAGE 

Using  the  original  digital  tapes  [19]  would 
eliminate  the  errors  we  believe  are  due  to  the 
digitization  process.  Host  of  the  geometric  dis- 
tortion can  be  dealt  with  as  well  [1].  Further 
matcn  improvement  must  come  fron^  better  synthetic 
images. 

The  most  significant  step  here  would  be  the  in- 
clusion of  surface  cover  information.  Even  a 
coarse  categorization  into  materials  of  grossly 
differing  "albedo"  might  be  useful.  Conversely,  of 
course,  one  can  exploit  the  difference  in  intensi- 
ties between  the  real  and  the  synthetic  image  to 
estimate  surface  reflectance.  Since  alignment  is 
possible  without  accurate  reflectance  models,  the 
ratio  of  real  to  synthetic  intensity  (a  measure  akin 
to  albedo)  can  be  used  in  terrain  classification, 
particularly  if  it  is  calculated  for  each  of  the 
spectral  bands. 

Cast  shadows  are  fairly  easy  to  deal  with,  if 
we  implement  a hidden-surface  algorithm  to  determine 


which  surface  elements  can  be  seen  from  the  source. 
This  computation  can  be  done  fairly  quickly  using 
a well  known  algorithm  [22].  Sky  illumination  in 
shadowed  areas  presents  no  great  stumbling  block 
in  this  regard. 

The  quality  of  terrain  models  is  likely  to  in- 
crease most  rapidly  when  fully  automatic  scanning 
stereo  comparators  become  available.  Until  then, 
hand-editing  of  hand-traced  information  will  have 
to  be  used  to  limit  the  errors  in  the  estimation 
of  gradient. 

One  notion  that  shows  great  promise  is  that 
of  masks  derived  from  both  the  terrain  model  and 
the  real  image.  The  masks  are  used  to  limit  the 
correlation  operation  to  those  areas  which  are  not 
as  likely  to  lead  to  mismatches.  Areas  of  very 
high  intensity  in  the  image,  for  example,  may  sug- 
gest cloud  or  snow  cover,  and  ought  not  to  be  used 
in  the  matching  operation.  Similarly,  it  may  be 
that  areas  of  certain  elevations  and  surface 
gradients  are  better  than  others  for  matching.  The 
correlation  can  be  improved  considerably  if  we  use 
only  those  regions  above  the  elevations  at  which 
dense  vegetation  exists  and  below  the  elevation 
at  which  snow  may  have  accumulated.  A slightly 
more  sophisticated  method  would  note  that  snow 
tends  to  remain  longer  on  north-facing  slopes. 

THE  INFLUENCE  OF  SUN  ELEVATION 

Aerial  or  satellite  photographs  obtained  when 
the  sun  is  low  in  the  sky  show  the  surface  topog- 
raphy most  clearly.  In  this  case,  the  surface 
gradient  is  the  major  factor  in  determining  surface 
reflectance.  Ridges  and  valleys  stand  out  in 
stark  relief,  and  one  gets  an  immediate  impression 
of  the  shape  of  portions  of  the  surface.  Converse- 
ly, variations  in  surface  cover  tend  to  be  most 
important  when  the  sun  is  high  in  the  sky.  Photo- 
graphs obtained  under  such  conditions  are  difficult 
to  align  with  a topographic  map  --  at  least  for  a 
beginner. 

What  is  the  sun  elevation  for  which  these  two 
effects  are  about  equally  important?  Finding  this 
value  will  allow  us  to  separate  the  imaging  situa- 
tions into  two  classes:  those  which  are  more 
suited  for  determining  topography  and  those  which 
are  more  conducive  to  terrain  classification  suc- 
cess. We  will  use  a simple  model  of  surface  re- 
flectance. Suppose  that  the  surface  has  materials 
varying  in  "albedo"  between  pi  and  p2.  Next,  sup- 
pose that  the  surface  slopes  are  all  less  than  or 
equal  to  tan(e).  The  incident  angles  will  vary 
between  e - (90°  - ♦)  and  e + (90°  - ♦),  where 
is  the  elevation  of  the  sun.  If  we  use  the  same 
simple  reflectance  function  employed  before,  we 
find  that  for  the  two  Influences  on  reflectance  to 
be  just  equal : 

p^cos(e  + 90°  - ♦)  • p2COs(e  - 90°  + ♦) 

Expanding  the  cosine  and  rearranging  this  equation 
leads  to; 
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When,  for  example,  the  surface  materials  have 
reflectances  covering  a range  of  two  to  one  and  the 
sun  elevation  Is  35°,  then  regions  with  surface 
slopes  above  approximately  0.23  (e  - 13°)  will  have 
Image  Intensities  affected  more  by  surface  gradient 
than  by  surface  cover.  Conversely,  flatter  sur- 
faces will  result  in  Images  more  affected  by  vari- 
ations In  surface  cover  than  by  the  area's  topog- 
raphy. 

One  possible  conclusion  Is  that  alignment  of 
Images  with  terrain  models  is  feasible  without  de- 
tailed knowledge  of  the  surface  materials  the 
sun  elevation  Is  small  and  the  surface  slopes  are 
high.  Since  LANOSAT  Images  are  taken  at  about 
9:30  local  solar  time  [1],  the  first  condition  Is 
satisfied  and  alignment  of  these  Images  is  possible 
even  In  only  lightly  undulating  terrain. 
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Figure  lb 


Early  afternoon  (13:48  G.H.T.)  synthetic  image 


Figure  7a.  Synthetic  image  used  in  the  alignment  experiments 
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Figure  8. 


Coordinate  transformation  from  synthetic  image  to  real  image. 


Figure  9. 


Simple  interpolation  scheme  applied  to  the  real  image  array. 


Figure  10c.  Variation  of  similarity  measure  with  rotation. 

Figure  lOd.  Variation  of  similarity  measure  with  scale  changes. 
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Figure  r2ft. 


Scattergram  of  reai  Image  Intensities  versus  synthetic  Image 
Intensities  based  on  oJp.q). 


Figure  T2b. 


Scattergram  of  real  Image  Intensities  versus  synthetic  Image 
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ABSTRACT 

Clustering  performance  of  the  Coleman 
segmentor  has  been  further  evaluated  from 
a pattern  recognition  viewpoint.  Quite 
effective  feature  rejection  to  enhance 
tighter,  more  homogeneous  clusters  (image 
segments)  results  in  large  dimensionality 
reduction  with  equivalent  clustering 
performance . 


INTRODUCTION 

Automatic  bottom  up  human  unassisted 
image  segmentation  has  been  developed  by 
Coleman  [l]  for  the  Image  Understanding 
program.  The  system  utilizes  pattern 
recognition  techniques  in  N dimensional 
vector  space  to  perform  decorrelation, 
clustering,  feature  rejection  and  ultimate 
segmentation.  The  only  underlying  assump- 
tion for  the  process  is  that  homogeneous 
clusters  in  N space  are  representative  of 
homogeneous  regions  of  an  image  in  percep- 
tual space.  The  system  is  designed  to 
operate  with  any  set  of  computable  features 
and  will  automatically  select  the  best 
subset  of  those  features  to  develop  tightly 
clustered  homogeneous  regions  in  N space 
which  then  serve  to  define  the  segmentation 
of  the  original  image.  In  the  interest  of 
smart  sensor  implementation,  the  system  has 
been  designed  for  frame- to- frame  segment- 
ation for  real  time  television- like 
sensors . 

SEGiiENTOR  CONFIGURATION 

Figure  1 presents  a block  diagram 
representative  of  the  system  design  of  the 
segmentor.  The  first  component  of  the 
system  is  the  "feature  computation"  phase 
This  process  computes  the  features  that  the 
designer  feels  will  be  relevant  for  ef- 
fective clustering.  Essentially,  features 
are  computed  up  to  as  high  a resolution  as 
at  every  pixel  if  desired.  Because  the 
features  to  be  computed  are  defined  by  the 
user,  it  is  at  this  phase  that  human  in- 
tuitive and  design  processes  are  brought  to 
bear  on  the  segmentation  problems.  Once 
the  human  defined  features  are  computed, 


the  system  then  becomes  automatic  for  sub- 
sequent optimization.  The  computed 
features  (N  of  these)  then  define  an 
N dimensional  coordinate  system  wherein 
each  pixel  will  subsequently  represent  a 
point  in  N space.  Typical  features  that 
might  be  computed  are  listed  in  Cable  1 
Brightness  amplitudes  for  monochrome, 
color,  and  multispectral  scenes  are  ob- 
vious candidate  features.  Texture 
features  might  be  delineated  by  edges  or 
other  spatial  frequency  processors  and  are 
listed  in  the  table.  Finally,  nonlinear 
spatial  filtering  processes  might  also  be 
useful  for  segmentation  and  this  class  of 
features  is  listed  as  well.  Obviously, 
as  humans  we  can  continue  to  generate  more 
features  as  we  become  more  familiar  with 
our  processing  goals.  The  only  point  to 
be  made  here  is  that  the  feature  computa- 
tion box  will  only  be  as  clever  as  its 
designer.  Subsequent  to  this  phase,  all 
processes  become  automatic.  However,  note 
how  simple  it  is  to  generate  fairly  large 
dimensional  vector  spaces  at  the  front  end 
of  the  system.  It  is  because  of  man's 
propensity  to  generate  so  much  data  that 
subsequent  optimization  and  feature  re- 
jection procedures  must  be  developed  to 
efficiently  and  economically  process  such 
data  for  ultimate  segmentation  purposes. 

Returning  to  figure  1 we  see  that  the 
next  phase  of  the  segmentor  configuration 
is  a straightforward  vector  space  rotation 
(unitary  transformation)  defined  by  the 
eigenvectors  of  the  overall  covariance 
matrix  between  all  N features  computed 
over  the  entire  image.  The  objective  of 
this  phase  is  to  decorrelate  the  features 
such  that  clustering  is  implemented  in  N 
dimensional  decorrelated  space.  In  this 
way  good  features  can  be  selected  individ- 
ually and  bad  features  rejected  individu- 
ally without  concern  as  to  correlation 
properties  with  other  features.  This  will 
allow  efficient  compaction  of  good 
clustering  features  into  a few  parameters 
thereby  providing  a large  dimensionality 
reduction,.  However  it  is  important  to 
realize  Chat  feature  reduction  does  not 
occur  immediately  following  the  rotation 
process  but  only  subsequent  to  clustering 
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analysis . 

This  brings  us  to  the  next  step  in 
the  system  which  is  a k-means  clustering 
algorithm  in  N dimensional  rotated  space. 
This  algorithm  converges  to  a set  of  k-mean 
points  describing  the  best  assignment  of 
pixel  features  to  k-clusters  such  that  the 
sum  of  within  cluster  distances  is  the 
smallest.  The  disadvantage  of  the 
algorithm  is  that  it  requires  knowledge  of 
the  number  of  clusters,  k,  in  advance. 
Clearly  this  is  unknown  and  consequently 
the  k-means  clustering  routine  must  be 
implemented  f**.!,  all  reasonable  values  of 
k (i.e.  k=l,...,16).  Subsequent  blocks 
in  the  figure  are  designed  to  determine  the 
best  number  of  cluster  and  the  best  fea- 
tures to  provide  the  tightest  cluster 
distributions . 

Once  the  k-means  cluster  algorithm 
has  converged  to  the  minimum  spread  of 
points  in  N space,  a fidelity  measure,  6, 
is  computed  to  establish  the  tightness  of 
the  points  within  the  clusters  and  the 
degree  of  spread  or  separateness  of  the 
clusters  one  from  another.  This  fidelity 
measure  is  given  by 

3(k)  = tr[S^(k)]  tr[Sj^(k)] 

where  [S^^(k)]  is  the  within  cluster 
scatter  matrix  and  [S.  (k)]  is  the  between 
cluster  scatter  matrix  [2J.  It  can  be 
shown  that  B is  everywhere  nonnegative, 
has  at  least  one  maximum,  and  achieves  that 
maximum  where  the  ratio  of  the  within  clus- 
ter scatter  equals  the  between  cluster 
scatter.  Therefore  it  is  hypothesized  that 
the  optimal  n'lmber  of  clusters  (k)  occurs 
at  6 equal  to  Its  maximum.  Therefore  these 
values  of  8 and  k are  used  to  control  the 
output  segmentor  and  the  feature  rejector. 

The  feature  rejector  provides  the 
function  of  removing  those  features  which 
do  not  contribute  to  tight  homogeneous 
clusters.  Consequently,  this  process  bor- 
rows from  supervised  pattern  recognition 
theory  in  which  feature  selection/rejection 
is  often  implemented  through  the  use  of  the 
Bhattacharyya  distance  function  [3].  This 
function  provides  a measure  of  the  useful- 
ness of  a particular  dimension  or  feature 
by  investigating  that  feature's  ability  to 
separate  the  data  points  into  the  proper 
clusters  determined  by  the  k-means  conver- 
gence algorithm.  This  measure  is  provided 
by  mean  and  variance  parameters  determined 
by  each  dimension  for  all  the  clusters. 
Those  features  or  dimensions  which  do  not 
provide  well-defined  clusters  (due  to 
separate  means  and  tight  variances)  are 
rejected,  thereby  leaving  good  features  for 
more  tightly  homogeneous  clusters. 

EXPERIMENTAL  RESULTS 

A variety  of  images  have  been  segment- 
ed using  the  above  clustering  algorithm 


with  varying  degrees  of  perceptual  suc- 
cess . Figures  2 and  3 present  these 
results  in  pictorial  form.  Figure  2a  and 
3d  were  original  monochrome  images  while 
figure  2d  was  a color  image  and  figure  3a 
was  a ten  band  multispectral  image. 

Various  clustering  results  are  presented 
for  each  image  for  viewer  inspection. 

The  last  sequence  in  figure  3 represents 
clustering  on  frame-to-frame  imagery  to 
illustrate  the  potential  for  real  time 
hardware  smart  sensor  implementation. 

Probably  a more  relevant  representa- 
tion of  the  segmentor  in  operation  is  to 
view  the  Bhattacharyya  measures  and  clus- 
tering fidelity  factors  all  as  a function 
of  k,  the  number  of  clusters  for  each 
iteration  of  the  k-means  clustering  al- 
gorithm. These  results  are  presented  in 
figures  4 and  5.  In  figure  4 two  plots 
are  presented  illustrating  the  performance 
of  the  Bhattacharyya  feature  rejector.  In 
figure  4a  the  Bhattacharyya  distance 
values  are  plotted  for  each  dimension  or 
feature  in  the  correlated  space  for  the 
variables  {xj^  ,X2  , . . . ,x„}  from  figure  1. 

In  figure  4b''’the  Bhattacharyya  distances 
are  plotted  for  each  rotated  dimension  or 
feature  in  the  decorrelated  space  for  the 
variable  (y^ , yo . • • • . yv)  of  figure  1.  It 
is  immediately^obvious  that  by  decorrela  • 
ting  (rotating  the  space)  one  outstanding 
feature  results  which  hopefully  will  al..ow 
effective  clustering  in  a vastly  reducec 
vector  space  (see  figure  2b) . In  addition 
it  is  obvious  that  the  good  features 
(large  Bhattacharyya  values),  tend  to  be 
good  for  all  cluster  numbers  indicating  a 
degree  of  consistency  which  allows  feature 
rejection  of  those  dimensions  with  small 
Bhattacharyya  measure  with  some  degree  of 
confidence . 

Figure  5 indicates  how  the  cluster 
fidelity  parameter,  6,  behaves  as  the  num- 
ber of  clusters  increases.  Specifically, 
figure  5a  indicates  that  for  the  mono- 
chrome APC  image,  without  feature  rejec- 
tion, the  peak  of  6 is  quite  poorly 
defined  because  of  the  presence  of  a lot 
of  useless  features  essentially  adding 
noise  to  the  well-defined  clusters. 

However  for  the  case  of  the  four  best 
features  or  the  single  best  feature,  a 
much  more  marked  peak  results  at  a lower 
cluster  number.  A similar  effect  occurs 
for  the  colored  house  of  figure  5b.  How- 
ever from  the  curves  of  all  features 
compared  to  the  best  fevj  features,  not  as 
dramatic  a change  occurs.  This  is  because 
Che  use  of  color  features  provides  a 
considerable  improvement  in  the  segmenta- 
tion power  of  the  system  compared  to  hav- 
ing only  monochrome  features.  This  result 
correlates  well  with  our  intuitive 
experiences  in  which  color  and  multispec- 
tral signatures  provide  quite  useful  aids 
for  human  visual  segmentation  procedures. 


CONCLUSION 

The  above  description  covers  the  high- 
lights of  the  segmentor  developed  by 
Coleman.  The  Interested  reader  is  referred 
to  reference  [1]  for  details  of  the  system. 
The  algorithm  represents  a bottom  up 
attempt  at  automatically  segmenting  imag- 
ery without  the  aid  of  human  intervention. 
It  allows  any  conceivable  set  of  features 
to  be  used  for  clustering  but  reserves  the 
right  to  feature  reject  those  parameters 
which  do  not  contribute  to  well-defined, 
tight  clusters.  The  technique  is  based 
upon  the  principles  of  mathematical 
clustering  algorithms  in  N dimensional 
vector  space.  The  underlying  hypothesis 
for  success  of  the  technique  is  based 
upon  the  premise  that  tight,  well-defined, 
homogeneous  clusters  in  vector  space  cor- 
respond to  well-defined  homogeneous 
regions  in  an  image.  If  this  premise  is 
true,  successful  (i,e.  consistent  with 
human  perception)  segmentation  results. 


If  unsuccessful  segmentation  results,  then 
improper  features  are  provided  in  the 
feature  computation  phase  which,  through 
the  linear  operations  of  decorrelation 
and  feature  rejections,  do  not  provide 
proper  region  segments.  It  is  then  con- 
jectured that  nonlinear  transformations 
(or  other  features)  are  necessary.  Final- 
ly, the  segmentor  has  been  designed  with 
smart  sensor  real  time  implementation  in 
mind.  The  hardware  construction  of  such 
a system  is  under  contemplation. 
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FEATURE  TABLE  I 


INDEX 

FEATURE  DESCRIPTION 

FEATURE  CLASS 

*1 

(BonocNrotne  brightness 

aonochromecic  eaplltude 

*2  1 

red  color  6rigCicC\%4s 

*3 

green  color  brightness 

color  eaplitudc 

*4 

blue  color  brightness 

*5 

bend  1 brightness 

*6 

multispectrel  eaplitude 

*10 

bend  6 brightness 

*11 

Sobel  aegnitude  on 

*12 

Sobel  aegnitude  on  X2 

texture  feeture 

*20 

Sobel  aegnitude  on  x^^q 

*21 

Sobel  phese  on  x^ 

texture  orientetion 

o 

K 

Sobel  phese  on  x^q 

*31 

aode  filter  on  X|^ 

*40 

aode  filter  on  x^q 

nonlineerly  filtered  feeture 

*41 

dispersion  filter  on  x^ 

*50 
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Figure  2.  Pictorial  Clustering  Results 
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ABSTRACT 

Three  new  developments  in  the  use  of 
convergent  evidence  are  presented: 

1.  The  clustering  of  edge  values  for 
threshold  selection. 

2.  The  use  of  "conformity"  — a 
measure  of  region  definedness. 

3.  A recursive  region  extraction  al- 
gorithm. 


INTRODUCTION 

The  use  of  separate  sources  of  infor- 
mation to  corroborate  or  strengthen  an 
assertion  ("convergent  evidence")  has 
proven  very  useful  in  our  research  on  FLIR 
image  understanding.  Two  reports  [1,  2] 
have  described  previous  applications  of 
the  concept*,  one  concerns  region  extrac- 
tion; the  second,  region  tracking.  In 
this  report,  we  discuss  three  recent 
applications. 

CLUSTERING  EDGE  VALUES  FOR  THRESHOLD  SELEC- 
TION 

A variety  of  schemes  for  selecting 
thresholds  are  known  [3] . Many  of  them 
attempt  to  deduce  the  best  threshold  by 
studying  properties  associated  with  points 
on  the  boundary  between  object  and  back- 
ground. Thus  if  the  object  is  substantial 
and  contrasts  with  the  background,  then  the 
image's  gray  level  histogrsun  will  exhibit 
a valley  at  the  gray  level  associated  with 
border  points.  More  complicated  schemes 
study  the  cooccurrence  of  high  edge  value 
and  gray  level  as  represented  by  a two- 
dimensional  histogreun.  It  has  been  shown 
that  for  images  containing  one  object 
class  and  cr.e  background  class,  the  aver- 
age gray  level  of  high  edge  value  points 
predicts  a good  threshold  [4] . This 
scheme,  however,  fails  for  images  having 
several  object  classes,  since  high  edge 
values  may  arise  from  the  adjacencies  amcng 
several  gray  level  populations  each  requir- 
ing a different  threshold.  In  what 
follows,  we  present  an  approach  to 
thresholding  in  a multi-population  environ- 
ment. 


Our  approach  is  to  produce  clusters 
of  points  corresponding  to  region  borders 
and  to  associate  the  average  gray  level 
of  each  cluster  with  a threshold  for  the 
corresponding  region.  Region  borders 
usually  correspond  to  points  of  locally 
maximum  edge  response.  Our  previous  work 
suggests  the  use  of  edge  detectors  which 
select  at  each  point  the  maximum  differ- 
ence of  averages  of  adjacent  neighbor- 
hoods over  several  directions  [5] . By 
suppressing  non-maximum  responses  normal 
to  the  selected  direction  (i.e.,  across 
the  edge) , thin  contours  result  which 
appear  to  surround  object  regions.  Figure 
lb  shows  unthinned  edge  detector  re- 
sponse; Figure  Ic  illustrates  the  results 
after  non-maximum  suppression. 

This  process  produces  as  a by-product 
points  with  very  low  edge  value,  includ- 
ing values  which  truncate  to  zero.  Such 
points  correspond  to  the  interiors  of 
homogeneous  regions.  It  is  useful  to  in- 
clude such  points  in  our  analysis,  even 
though  they  constitute  a population  with 
fundamentally  different  properties  from 
contour  points.  Figure  2 illustrates 
thinned  detector  responses  with  region 
interior  maxima  included. 

Once  the  thinning  has  been  accompl- 
ished, the  resulting  points  can  be  accu- 
mulated in  a two-dimensional  histogram. 
Each  point  contributes  an  edge  value  and 
a gray  level  value.  In  practice,  it  was 
found  that  the  gray  level  of  the  point 
does  not  serve  as  a good  coordinate. 
Specifically,  for  a step  edge,  the  gray 
level  of  an  edge  point  lies  in  one  popula- 
tion or  the  other  but  not  at  any  inter- 
mediate gray  value.  Figure  3 shows  a 
test  pattern  and  the  2-D  histogram  result- 
ing from  plotting  points  based  on  gray 
level  and  edge  value.  (Gray  value  is 
plotted  from  left  to  right;  edge  value, 
from  top  to  bottom.)  Note  that  the 
histogram  consists  of  a pair  of  spi)c.es 
corresponding  to  the  gray  levels  on  either 
side  of  the  step  edge.  A single  cluster 
results  if  the  average  gray  level  of  the 
two  neighborhoods  which  contributed  the 
maximal  edge  response  at  the  point  is 
plotted  on  the  gray  scale.  Figure  3c 
shows  the  effect  on  the  histogram  of 
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using  the  average  gray  level  instead  of  the 
point  gray  level. 

In  images  consisting  of  disjoint 
homogeneous  objects  on  a homogeneous  back- 
ground, the  thinned  edge  contours  will 
correspond  to  clusters  in  the  2-D  histo- 
gram. Moreover,  their  interiors  will  pro- 
duce clusters  at  edge  values  close  to  zera 
It  is  important  to  note  that  the  size  of  a 
cluster  (i.e.,  the  number  of  points  in  it) 
is  closely  related  to  properties  of  the 
region  it  describes.  Thus  interior 
clusters  relate  both  to  the  area  of  the 
region  and  to  the  size  of  the  neighborhood 
over  which  the  local  operations  (edge  de- 
tection, non-maximum  suppression)  are  de- 
fined. For  small  object  regions,  there 
may  be  points  sufficiently  far  from  the 
o! 'ect  boundary  to  resist  suppression. 

Thus  interior  clusters  may  be  indistin- 
guishable from  noise,  or  may  be  non- 
existent. 

Clusters  of  points  at  higher  edge 
values  are  more  likely  to  be  significant 
(based  on  our  homogeneity  assumptions) . 

The  size  of  an  edge  cluster  is  therefore 
related  to  the  perimeter  of  the  surrounded 
region  in  the  image.  Since  perimeter  in- 
creases (roughly,  for  digital  images)  as 
the  square  root  of  area,  the  edge  clusters 
for  objects  of  moderately  different  areas 
should,  nonetheless,  be  of  comparable  size. 
A priori  estimates  of  size  are  of  use  in 
discriminating  true  edge  clusters  from 
random  noise. 

Edge  clusters  and  interior  clusters 
t>ear  certain  relationships  to  one  another. 
For  ex2unple,  an  edge  cluster  whose  cen- 
troid is  (e,g) , where  e is  the  average 
edge  value  and  g the  average  plotted  gray 
level,  probably  serves  as  the  contour 
separating  two  regions  of  gray  level  g-e/2 
ana  g+e/2.  Finding  two  interior  clusters 
at  gray  levels  g-e/2,  g+e/2,  respectively, 
would  serve  as  confirmation  to  this  asser- 
tion. Conversely,  to  determine  whether 
two  regions  with  average  gray  level  9^i92 

respectively,  share  a common  edge  (i.e., 
are  adjacent) , one  could  attempt  to  locate 
an  edge  cluster  with  centroid 
(|g2-gil»  (gi+g2)/2)-  Figure  4 shows 
three  regions  with  various  types  of  adjac- 
ency and  the  2-D  histograms  which  derive 
from  them.  Lower  gray  level  values  corres- 
pond to  darker  shades  of  gray. 

We  have  investigated  some  simple 
methods  of  cluster  extraction,  and  we  now 
describe  one  which  has  been  moderately 
successful.  First,  use  the  histogram  of 
thinned  edge  values  (the  projection  of  the 
2-D  histogram  on  the  edge  axis)  to  detect 
edge  value  ranges  containing  significant 
peaks.  Each  of  these  ranges  corresponds 
to  a strip  across  the  2-D  histogram.  Now 
construct  a gray  value  histogram  for  each 
strip  and  segment  it  according  to  its 


peaks.  Each  such  segment  corresponds  to 
a rectangle  in  the  2-D  histogram. 

Clusters  are  associated  with  well- 
populated  rectangles.  Thresholds  are 
then  computed  as  the  average  gray  levels 
within  clusters. 

Given  a set  of  thresholds  for  eui 
image,  it  is  unclear  how  one  applies  them 
to  extract  regions.  For  example,  con- 
sider the  drawing  in  Figure  5a  and  its 
2-D  histogram.  Figure  Sb.  The  center  of 
the  edge  cluster  belonging  to  the  interior 
clusters  at  gray  levels  30  and  40  is  at 
gray  level  35;  while  the  edge  cluster 
separating  the  interior  clusters  at  20 
and  40  has  30  as  its  center.  Thus  the 
thresholds  are  30  and  35.  The  threshold 
at  30  will  optimally  separate  the  back- 
ground from  the  outer  boundary  of  the 
ring;  however,  it  will  cause  the  hole 
in  the  ring  to  break  up  in  a random 
fashion.  The  threshold  at  35  will  in 
fact  separate  the  hole  from  the  ring  but 
will  assign  too  many  border  points  of  the 
20-40  border  to  the  background  region. 

Thus  neither  threshold  is  by  itself 
optimal.  One  possible  solution  is  to 
partition  the  2-D  histogram  into  disjoint 
regions  which  are  labelled  as  to  object 
class  (Figure  6) . Thus  all  points  in  the 
original  image  will  be  classified  based 
on  a feature  pair  (gray  level  and  edge 
value) . The  location  of  each  feature 
pair  in  the  partitioned  histogram  will 
determine  the  object  class  to  which  each 
image  point  belongs.  Methods  for  par- 
titioning the  2-D  histogreun  are  being  in- 
vestigated. 

THE  USE  OF  "CONFORMITY"  - A MEASURE  OF 
REGION  DEFINEDNESS 

The  "Superslice"  algorithm  [1]  re- 
lies on  the  heuristic  that  thresholded 
object  regions  are  distinct  from  back- 
ground because  they  contrast  with  their 
surround  at  a well-defined  border.  The 
coincidence  of  high  contrast  and  high 
edge  value  at  the  border  of  a thresholded 
region  is  an  example  of  the  use  of  con- 
vergent evidence  supporting  the  assertion 
of  the  object  region.  The  definedness  of 
the  border  may  be  evaluated  as  the  per- 
centage of  the  border  points  which  coin- 
cided with  the  location  of  thinned  edge 
(locally  maximum  edge  response) . Thus  a 
match  score  of  50%  means  that  half  the 
border  points  are  accounted  for  as  t>eing 
on  the  edge.  However,  it  does  not  mean 
that  the  matched  points  adequately  repre- 
sent the  object.  Figure  7 illustrates 
two  cases  of  50%  match.  (Matched  points 
are  indicated  by  thick  strokes.)  Clearly, 
the  second  case  is  a better  representa- 
tion than  the  first. 

The  traveral  of  the  border  of  a 
thresholded  region  induces  an  ordering  on 
the  matched  points.  Let  >^x'‘'*'’'n 
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runs  of  matched  points  encountered  during 
a border  traversal.  By  connecting  the 
proximal  ends  of  runs  along  the  traversal, 
one  creates  a polygonal  approximation  to 
the  thresholded  region.  We  define  "con- 
formity" as  the  measure  of  match  of  the 
polygonal  approximation  to  the  thresholded 
region.  High  conformity  means  that  the 
region  is  well-represented  by  its  approxi- 
mation regardless  of  the  actual  percentage 
of  matched  border  points.  Figure  7a 
illustrates  low  conformity;  while  Figure 
7b  shows  good  conformity. 

Conformity  is  evaluated  as  the  ratio 
of  the  absolute  difference  in  area  (be- 
tween the  two  polygonal  representations) 
to  the  area  of  the  threshold  region.  Ex- 
periments have  indicated  its  utility  as  a 
feature  discriminating  noise  from  objects, 
A quantitative  study  of  its  classification 
value  is  underway, 

A RECURSIVE  REGION  EXTRACTION  ALGORITHM 

Much  effort  has  been  devoted  to  al- 
gorithms which  segment  images  based  on  re- 
gions homogeneous  with  respect  to  selected 
feature  values,  Ohlander  [6]  developed  a 
recursive  framewor)c  for  region  extraction 
which  asserts  the  existence  of  homogeneous 
regions  based  on  well-defined  modes  in  one 
of  several  histograms.  The  extraction  of 
a set  of  points  corresponding  to  a mode  of 
some  feature  forces  the  recomputation  of 
the  other  feature  histograms  for  the  re- 
maining points.  Given  a sufficient  number 
of  (more  or  less)  independent  features, 
point  sets  can  be  continually  extracted 
until  further  decomposition  produces  only 
noise  regions. 

In  our  wor)c,  we  have  attempted  to  ex- 
tract only  those  regions  whose  assertion 
is  warranted  by  additional  evidence.  We 
have  developed  measures  of  contrast  and 
well-definedness  in  order  to  accept  or  re- 
ject regions  proposed  by  slicing.  These 
heuristics  can  be  used  to  strengthen  the 
recursive  algorithm  and  to  allow  its  oper- 
ation when  very  few  feature  histograms  are 
present.  In  fact,  using  gray  level  as  the 
single  feature  for  computing  a histogr^un, 
it  is  possible  to  segment  complex  images. 

The  algorithm  operates  as  follows: 

a.  Choose  a "best"  slice  range  from 
the  histogram.  (Section  1 
suggests  some  possibilities.) 


to  the  bac)cground  set  and  to  each 
of  the  extracted  regions. 

The  use  of  the  Superslice  algorithm 
to  extract  object  regions  while  rejecting 
noise  regions  means  that  points  lying  in 
the  slice  range  but  not  conforming  to  an 
object  region  are  returned  to  the  pool  of 
histogrammed  points.  Thus  a "liberal" 
slice  range  (i.e.,  extending  t>eyond 
valleys  in  the  histogram)  will  allow  good 
region  definition  with  the  extra  points 
returned  to  the  unclassified  pool.  The 
resulting  histogram  does  not  have  a 
"carved-out"  loo)i  to  it,  and  further 
slice  range  selection  is  possible. 

In  order  to  test  out  this  idea,  we 
constructed  an  interactive  system  which 
allows  the  user  to  display  the  gray  level 
histogram,  select  a slice  range,  extract 
well-defined  regions,  and  construct  mas)cs 
of  accepted  regions.  Figure  8 displays 
the  products  of  the  analysis  of  an  image 
using  this  approach. 
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b.  Slice  the  image  according  to  the 
slice  range. 

c.  Extract  those  regions  which  satis- 
fy the  Superslice  criteria  (see 
Section  2 for  further  details) . 
Return  all  other  points  in  the 
slice  range  to  the  bac)cground  set, 
and  recompute  the  feature  histo- 
gram. 

d.  Apply  the  algorithm  recursively 
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Figure  la. 

lb. 


Ic. 


Window  containing  tank. 

Edge  detector  response 
(thresholded) . 

Thinned  edge  detector  response. 


Figure  5a.  Adjacent  object  region  on 
background  (same  as  Figure 
4a)  . 

5b.  2-D  histogram. 


Figure  2a. 

2b. 


LANDSAT  window  of  Monterey. 
Thinned  edge  detector  response. 


■ 


Figure  3a.  Square  on  background. 

3b.  2-D  histogram  with  gray  levrl 

as  x-axis  and  edge  value  as 
y-axis  (stretched) . 

3c.  2-D  histogrtun  with  average 

gray  level  as  x-axis  and  edge 
value  as  y-axis  (stretched) . 


Figure  6.  2-D  histogram  of  Figure  5a, 

partitioned  into  classifica 
tion  regions. 


b. 


Figure  7a.  Contour  whose  matched  edge 
points  (thickened  strokes) 
exhibit  poor  conformity . 

7b.  Contour  showing  good  confer 
mity . 


a.  b. 


Figure  4a.  Disk  (gray  level  30)  within 
ring  (gray  level  40)  within 
background  (gray  level  20) . 

4b.  2-D  histogram  (distorted  for 

visibility  - interior  of  back- 
ground is  leftmost,  topmost 
cluster) . 
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ABSTRACT 

The  past  six  months  have  seen  progress 
on  a variety  of  fronts  including  object 
estimation  in  noise,  phase  unwrapping  for 
restoration,  perceptual  models  for  smart 
sensor  design,  and  degree  of  freedom  models 
for  a variety  of  SAR  imaging  configura- 
tions. These  and  other  results  will  be 
summarized . 


SMART  SENSORS 

Testing  of  the  Sobel  circuit  and 
adaptive  circuit  is  progressing  quite 
nicely  at  the  Hughes  Research  Laboratories. 
Results  on  these  two  circuits  are  discussed 
elsewhere  in  this  workshop  proceedings  for 
the  Interested  reader.  Plans  for  sub- 
sequent circuit  designs  include:  a)  cir- 
cuits operating  at  real  time  TV  rates  with 
8 bits  (%7.)  accuracy,  b)  development  of  a 
texture  measuring  circuit,  and  c)  investi- 
gation of  a segmentor  circuit. 

IMAGE  UNDERSTANDING  PROJECTS 

Professor  Nevatla  and  Dr.  Price  have 
concentrated  on  applying  image  understand- 
ing techniques  to  locating  structures  in 
aerial  images.  Contextual  knowledge  is 
utilized  in  the  process.  In  addition, 
circle  detectors  in  noisy  images  have  been 
Investigated  for  application  to  shape 
structure  development  algorithms. 

Professor  Pratt  and  his  students  have 
concentrated  on  quantitative  edge  evalua- 
tors with  some  quite  successful  comparisons 
resulting.  Signal  detection  as  well  as 
pattern  recognition  approaches  are  utilized 
in  the  evaluation  process.  In  addition, 
other  results  Include  Initiation  of  re- 
search into  the  use  of  the  singular  value 
decomposition  operator  as  a useful  texture 
aid.  So  far,  various  texture  patterns  have 
been  analyzed,  and  the  singular  value  map 
of  both  artificial  and  natural  textures  of 
similar  classes  (l.e.  grass)  are  nearly 
Identical  while  for  different  classes 
(l.e.  ivy),  the  maps  are  also  different, 
^erefore,  it  appears  that  the  shape  of  the 
singular  values  may  provide  a useful 


Indication  of  texture  measure. 

Finally,  the  Coleman  segmentor  has 
been  refined  and  used  to  process  mono- 
chrome, color,  multispectral , and  frame- 
to- frame  imagery  with  varying  degrees  of 
"perceptual  success."  This  workshop 
proceedings  has  a summary  of  the  segmentor 
performance  evaluation  parameters,  and  the 
Interested  reader  may  learn  of  the  details 
of  the  project  from  USCIPI  Report  #750. 

PERCEPTUAL  MODELING  FOR  IMAGE 
UNDERSTANDING 

Two  topics  in  the  area  of  human  visu- 
al system  (HVS)  modeling  appear  to  be 
quite  relevant  to  the  image  understanding 
program.  First  as  a preprocessor,  it  now 
appears  feasible  to  use  the  nonlinear 
perceptual  model  as  a front  end  transform- 
ation preprocessor  on  a smart  sensor 
device.  This  will  allow  subsequent  signal 
and  image  understanding  processing  to  be 
implemented  much  more  efficiently  in  an 
adaptively  compressed  image  domain.  Sec- 
ondly the  statistical  and  potential  trans- 
form domain  properties  of  this  perceptual 
space  are  analyzed  and  results  presented, 
indicating  quite  high  potential  for  ef- 
ficient color  image  coding  and  texture 
evaluation  and  measurement. 

RESTORATION 

Digital  image  restoration  has  been  a 
research  topic  for  some  time  now  and  has 
resulted  in  a variety  of  reconstruction 
algorithms . Recently  progress  has  been 
made  in  the  area  of  detection-estimation 
theory  applied  to  bou  idary  definitions  of 
objects  buried  in  la-re  sensor  noise  sys- 
tems. This  work  has  culminated  in  a dis- 
sertation topic  and  the  interested  reader 
is  referred  to  USCIPI  Report  #760  for 
further  details . 

The  problem  of  a posteriori  (after 
the  fact)  restoration  has  been  under  in- 
vestigation with  particular  emphasis  on 
recovering  the  phase  of  the  optical  trans- 
fer function  (OTF)  distortion  of  a space 
Invariant  imaging  system.  Recursive 
algorithms  for  both  magnitude  (modulation 
transfer  function  or  MTF)  and  phase  of  the 
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OTF  have  been  developed.  Real  imagery  is 
now  being  processed  with  significant 
results . 

Finally  a new  project  has  been  initia- 
ted over  the  past  six  months  specifically 
designed  to  meet  the  image  understanding 
needs  of  nonvisual  imagery.  Specifically, 
the  degrees  of  freedom  of  a variety  of 
radar  imaging  systems  are  under  investiga- 
tion. The  stripping  and  spotlight  SAR 
modes  of  radar  imagery  collection  are  being 
analyzed  as  to  their  inherent  information 


content,  both  to  define  and  measure  limit- 
ations and  provide  suggested  improvements . 
However  the  more  relevant  objective  of  the 
project  is  to  provide  automatic  interpret- 
ation and  understanding  of  the  imagery. 

This  is  particularly  appropriate  as  human 
interpretation  of  radar  imagery  is  somewhat 
limited  by  the  foreign  looking  appearance 
of  such  pictures  compared  to  natural 
(visibly  sensed)  imagery. 


110 


INTBRACTIVE  AIDS  FOR  CARTOGRAPfll  AND  PHOTO  INTERPRETATION: 
PROGRESS  REPORT,  OCTOBER  1977. 


H.O.  Barrow  (Principal  Investigator), 
R.C.  Bowles,  T.D.  Garvey,  J.H.  Ereaers, 
K.  Lantz  , J.H.  Tenenbaw,  and  H.C.  Wolf. 

Artificial  Intelligence  Center, 

SRI  International 
Henlo  Parle,  CA  94025. 


I  INTRODUCTION 

This  report  describes  the  ongoing  SRI  ioage 
understanding  project.  The  central  scientific  goal 
of  this  project  is  to  investigate  and  develop  ways 
in  which  diverse  sources  of  Icnowledge  may  be 
brought  to  bear  on  the  problea  of  Interpreting 
iaages.  The  research  is  focused  on  the  specific 
probless  entailed  in  interpreting  aerial 
photographs  for  cartographic  or  intelligence 
purposes.  Additional  details  are  to  be  found  in 
earlier  progress  reports  [1]  [2]  [3]. 

A key  concept  is  the  use  of  a generalized 
digital  aap  to  guide  the  process  of  inage 
interpretation.  This  nap  is  actually  a data  base 
containing  generic  descriptions  of  objects  and 
situations,  available  loagery,  and  techniques,  in 
addition  to  topographical  and  cultural  infornation 
found  in  conventional  maps. 

He  recognize  that  within  the  limitations  of 
the  current  state  of  image  understanding  it  is  not 
possible  to  replace  a skilled  photo  Interpreter. 
It  is  possible,  however,  to  greatly  facilitate  his 
work  by  providing  a number  of  collaborative  aids 
that  relieve  him  of  his  more  mundane  and  tedious 
chores  [ 1 ) . 


II  OVERVIEW  OF  HAUKEYE 

Our  work  has  been  centered  on  evolutionary 
development  toward  an  integrated  interactive 
system.  The  system  consists  of  an  interactive 
display  console,  a map  data  base,  an  image  library, 
general  image  analysis  routines,  and  task 
specialist  routines.  The  capabilities  described 
here  have  been  demonstrated  as  Independent  programs 
that  share  only  data  files.  He  are  in  the  process 
of  Integrating  them  into  a single  coherent  system, 
known  as  Hawkeye.  Users  conunicate  with  Hawkeye 
naturally,  in  free-form  English  and  via  Interactive 
graphics.  The  following  scenario  Illustrates  the 
major  capabilities  that  have  been  demonstrated  to 
date. 

The  first  task  when  a new  image  enters  the 
system  is  to  establish  correspondence  with  the  map. 
This  is  accomplished  automatically,  by  selecting 
potentially  visible  landmarks  (using  navigational 
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data  associated  with  the  image)  and  then  locating 
them  in  the  image  using  scene  analysis  techniques. 
The  next  step  is  to  confirm  the  validity  of 
existing  knowledge.  The  system  can  automatically 
verify  the  presence  of  certain  cartographic 
features,  such  as  roads  and  waterways,  and  can  also 
monitor  the  status  of  some  typical  dynamic 
situations,  such  as  ships  berthed  in  harbor  or 
boxcars  stored  in  a classification  yard.  New 
features  are  identified  and  Incorporated  into  the 
data  base  through  the  use  of  a number  of 
interactive  aids  for  mensuration  and  tracing.  For 
example,  new  roads  can  be  traced,  or  heights  of 
bridge  supports  can  be  measured. 

The  system  can  now  use  the  data  base  to  answer 
simple  queries,  such  as  "show  me  PierlH",  "what  is 
this  building?"  or  "bow  high  is  that  mountain?" . 
These  queries  are  entered  by  a photo  interpreter 
via  keyboard  and  display  cursor.  It  also  has  the 
potential  for  responding  to  a more  complex  query, 
such  as  "how  many  ships  were  in  Oakland-Itarbor 
yesterday?",  by  retrieving  the  relevant  image  from 
the  library,  and  then  Invoking  the  appropriate  task 
specialist.  The  system  has  the  ability  to  accept 
such  requests  entered  remotely  (say,  by 
Intelligence  analysts)  and  execute  them 
automatically  if  it  understands,  or  else  relay  them 
to  the  user  (the  photo  Interpreter)  for  interactive 
execution. 

At  this  time,  the  questions  that  can  be 
handled  automatically  are  limited  by  the  present 
small  size  of  the  data  base  and  the  available 
specialist  routines,  which  automate  tasks  carefully 
chosen  to  exploit  existing  primitive  low-level 
vision  capabilities.  Demonstrated  capabilities  do, 
however,  show  the  potential  of  bringing  image 
understanding  and  artificial  intelligence 
approaches  to  bear  on  problems  in  cartography  and 
photo  interpretation. 
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A ioau  AnalYaia 

In  this  section,  we  describe  the  components  of 
Hawkeye  that  are  directly  concerned  with  Image 
analysis  functions,  such  as  those  Illustrated  In 
the  scenario. 

1.  Mao/laage  correspondence 

The  first  task  In  the  scenario  Is  putting 
the  sensed  Image  Into  geometric  correspondence  with 
reference  Imagery  or  a map  data  base.  This  Is 
fund2uiental  to  virtually  every  military  application 
of  Imagery.  Our  Initial  approach  was  a modest 
Improvement  on  conventional  Image  correlation. 
Given  an  Image , such  as  Figure  1 , and  approximate 
viewpoint,  the  system  determined  potentially 
visible  landmarks  and  then  retrieved  from  the 
library  Images  containing  the  landmarks.  Figure  2 
shows  a selected  reference  Image  with  the  area  of 
overlap  and  the  contained  landmarks  overlaid  on  It. 

For  each  landmark,  an  appropriate  area  of 
the  reference  Image  was  extracted  and  reprojected 
to  make  It  appear  more  similar  to  the  sensed  Image. 
The  reprojectlon  was  accomplished  using  a ceusera 
model,  calibration  data  associated  with  the 
reference  Image,  and  elevation  data  obtained  from 
the  map.  The  reference  Image  fragment  was  first 
projected  down  onto  the  ground  plane,  and  thence 
back  up  onto  the  Image  plane  of  the  sensing  camera. 
Each  reprojected  Image  fragment  was  then  correlated 
In  a small  predicted  area  of  the  sensed  image, 
using  Horavec's  high-speed  algorithm  [5].  Figure 
3 shows  details  of  the  sensed  (right  top)  and 
reference  (left  top)  Images  near  a landmark.  The 
bottom  left  detail  Is  the  16x16  Image  chip 
surrounding  the  landmark  automatically  extracted 
for  use  by  the  system.  The  landmark  Is  sought  In 
the  area  delimited  by  the  large  square  In  the 
sensed  Image,  and  the  best  matching  area  is  shown 
at  bottom  midright.  The  reprojected  version  of  the 
chip  Is  shown  at  bottom  midleft,  and  the  best 
matching  area  at  bottom  right.  Note  that  the 
reprojected  reference  Image  more  closely  resembles 
the  sensed  Image  and  that  the  point  of 
correspondence  Is  therefore  more  precisely  located. 
Figure  4 Illustrates  Improved  reliability:  without 
reprojectlon,  the  best  match  Is  at  the  wrong 
location  (Indicated  by  X). 

The  matching  process  Is  repeated  for  all 
landmarks  expected  to  be  visible.  This  yields  a 
set  of  points  In  the  sensed  Image,  with  each  point 
corresponding  to  a particular  landmark  (Figure  5). 
From  the  pairs  of  corresponding  Image  and  world 
locations,  the  exact  camera  parameters  for  the 
sensed  Image  were  computed  by  solving  an 
overconstrained  set  of  equations.  He  can  determine 
a least-squared-error  solution  either  directly 
analytically,  or  by  an  Iterative  parameter 
optimisation  process:  the  latter  has  the  advantage 
that  any  known  constraints  on  parameter  values  can 
be  readily  Imposed. 

The  reprojectlon  technique  (unlike 
currently  used  techniques)  permits  the  use  of 
reference  Images  that  differ  radically  in  viewpoint 


from  the  sensed  image.  Even  an  oblique  image,  such 
as  shown  In  Figure  6,  can  be  matched  against  the 
same  reference  Image.  Figure  7 shows  matching  for 
a single  landmark.  The  views  are  so  different  that 
a meaningful  match  is  Impossible  without 
reprojectlon. 

Although  reprojectlon  prior  to  matching 
Is  an  Improvement  on  conventional  image 
correlation,  the  fundamental  limitation  of  the 
correlation  approach,  namely  sensitivity  to  viewing 
conditions,  remains.  In  particular.  It  still 
cannot  match  Images  obtained  from  radically 
different  viewpoints  when  the  three-dimensional 
scene  structure  Is  complex,  from  different  sensors, 
or  under  different  Illumination  or  climatic 
conditions;  and  It  cannot  match  Images  against 
symbolic  maps.  To  overcome  these  limitations,  we 
developed  a new  approaih,  which  we  call  parametric 
correspondence,  for  matching  Images  directly  to  a 
three-dimensional  symbolic  reference  map. 

The  map  contains  a compact  three- 
dimensional  representation  of  the  shape  of  major 
landmarks,  such  as  coastlines,  buildings,  and 
roads.  An  analytic  camera  model  Is  used  to  predict 
the  location  and  appearance  of  landmarks  In  the 
Image,  generating  a projection  for  an  assumed 
viewpoint.  Correspondence  Is  achieved  by  adjusting 
the  parameters  of  the  camera  model  until  the 
predicted  appearances  of  the  landmarks  optimally 
match  a symbolic  description  extracted  from  the 
Image.  The  matching  of  Image  and  map  features  Is 
performed  rapidly  by  a new  technique,  called 
"chamfer  matching",  that  compares  the  shapes  of  two 
collections  of  shape  fragments,  at  a cost 
proportional  to  linear  dimension,  rather  than  area. 
These  two  new  techniques  permit  the  matchli:g  of 
spatially  extensive  features  on  the  basis  of  shape, 
which  reduces  the  risk  of  ambiguous  matches  and  the 
dependence  on  viewing  conditions  inherent  In  the 
conventional  correlation  based  approach.  The 
techniques  are  described  In  more  detail  In  [4]  and 
[31.  They  have  obvious  application  to  navigation 
as  well  as  photo  Interpretation. 

Having  placed  the  Image  Into  parametric 
correspondence  with  the  three-dimensional  map,  we 
are  now  In  a position  to  predict  the  Image 
coordinates  of  any  feature  In  the  map.  Figure  8 
shows  two  pictures  with  the  same  section  of 
coastline  from  the  nap  superimposed  on  each.  This 
facility  is  used  In  monitoring  to  Indicate  exactly 
where  In  the  picture  to  look.  Conversely,  we  can 
predict  the  nap  features  corresponding  to  any  point 
In  the  Image.  This  can  be  used  to  facilitate 
Interactive  graphloal  communication  between  the 
photo  Interpreter  and  the  data  base.  In  Figure  9, 
the  user  has  two  Images  displayed  simultaneously 
and  can  point  with  a cursor  at  a location  In  one 
Image  and  have  the  sjrstam  Indicate  the 
corresponding  point  In  the  other.  (To  perform  the 
latter  function  aoourately,  the  system  needs  to 
know  the  three-dimensional  nature  of  the  terrain. 
He  are  still  In  the  process  of  setting  up  terrain 
data  In  the  map  data  base,  so  In  these  examples  the 
user  supplied  the  faot  that  the  area  In  question 
has  rou^ly  constant  elevation.) 
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Using  the  caaera  aodel  and  image 
calibration  permits  many  photo  interpretation 
mensuration  tasks  to  be  accomplished  simply. 
Routines  exist  for  determining  looation,  length, 
height,  or  straight-line  distance  for  features 
indicated  interactively  in  the  image.  In  Figure 
10,  the  user  is  measuring  the  height  of  a bridge 
support.  Veloolty  of  objects  (e.g.  ships  or  cars) 
indicated  in  two  Images  oan  also  be  determined.  In 
Figure  11,  the  user  indicated  a ship  in  one  image, 
and  the  system  used  the  landmark  finding  process  to 
locate  the  same  ship  in  the  other  image  and  hence 
to  determine  speed  from  the  deduced  distance  and 
the  known  time  delay  between  the  pictures. 

The  camera  model  provides  a unifying 
theoretical  foundation  that  subsumes  what  would 
otherwise  be  a collection  of  ad  hoc  trigonometric 
techniques  [6].  Combining  the  map  and  calibrated 
image,  the  system  can  also,  for  example,  determine 
alternative  routes  and  travel  distances  along  roads 
between  indicated  points. 

2.  Hap-guldad  BoaltorlM 

Having  a nap  and  image  in  correspondence 
makes  many  monitoring  tasks  simpler,  because  the 
nap  can  indicate  where  to  look  and  what  to  look  for 
in  the  image.  It  is  important,  however,  to  keep  in 
mind  that  a nap  is  only  an  approximation  to 
reality:  it  may  be  Incomplete,  be  out  of  date, 
suppress  details,  or  contain  errors.  In  order  to 
monitor  or  to  make  a detailed  interpretation  of  an 
image,  it  is  necessary  to  locate  image  coordinates 
of  objects  more  precisely  than  can  be  predicted 
using  the  asp  and  calibration.  In  other  words,  we 
need  routines  which  can  take  predictions  and  verify 
them  in  the  image.  As  a first  step  in  that 
direction,  we  developed  a guided  line  tracing 
routine  that  accepts  a rough  approximation  to  the 
path  of  linear  features,  such  as  rivers  or  roads, 
and  extracts  a test  estimate  of  the  precise  path  in 
the  image.  It  operates  by  applying  a specially 
developed  line  detector  in  the  vicinity  of  the 
approximate  path  and  than  finding  a globally 
optimal  path  based  on  the  local  feature  values 
[2].  Figure  12  shows  the  predicted  course  of  a 
road  in  a rural  area  (darker  line).  The  same  road 
has  also  been  predicted  without  making  use  of  the 
elevation  information  in  the  map  (lighter  line): 
note  that  this  prediction  is  considerably  in  error. 
Figure  13  shows  the  result  of  the  tracing  process, 
obtained  fully  automatically. 

The  tracing  routine  can  be  used  in  two 
ways:  to  verify  the  presence  of  known  cartographic 
features,  using  prediction  from  the  map  and  to 
interactively  trace  new  features  for  incorporation 
into  the  map,  using  a guideline  sketched  by  the 
user.  The  tracing  of  linear  features  is  currently 
a tedious  manual  process  that  constitutes  a major 
bottleneck  in  map  production  [1]  [7]. 

Having  a map  and  image  in  correspondence 
makes  the  automation  of  many  monitoring  tasks 
feasible.  Keeping  track  of  boxcars  in  a rallyard, 
for  example,  is  a typical  tedious  photo 


interpretation  task.  Knowing  the  layout  of  the 
tracks,  makes  the  task  essentially  a one- 
dimensional  template  matching  problem.  A routine 
has  been  developed  which  flies  statistical 
operators  along  a track  line  to  hypothesize 
possible  ends  of  boxcars.  These  hypotheses  are 
used  with  knowledge  of  standard  boxcar  lengths  and 
characteristics  of  empty  track  to  locate  the  gaps 
between  boxcars.  The  program  then  narks  the  cars 
with  a red  dot  Figure  14  and  reports  their  number, 
classified  by  length  [2]. 

Estimating  highway  traffic  is  a problem 
of  significant  military  importance,  which  could  be 
approached  by  flying  car  and  truck  templates  [8] 
along  the  path  determined  by  the  guided  road 
tracer.  He  plan  to  attack  this  problem  in  the  near 
future  (see  Section  IV). 

Monitoring  the  presence  of  ships  in  a 
harbor  is  particularly  easy  to  automate  when  the 
map  contains  details  of  berths.  Given  a question 
about  the  status  of  a particular  harbor  at  a 
particular  time,  the  appropriate  image  is  retrieved 
from  the  data  base.  The  ship  monitoring  routine 
then  projects  berth  locations  from  the  map  onto  the 
image  (Figure  18)  and  uses  an  edge  histogram  of 
that  region  to  determine  Whether  the  berth  is 
occupied  (Figure  16).  The  same  process  works 
equally  well  for  vertical  or  oblique  imagery  as 
shown  in  Figure  17. 

The  key  to  automatic  monitoring  lies  in 
having  the  capability  to  place  the  image  into 
correspondence  with  the  map,  which  then  accurately 
specifies  where  to  look.  A relatively  simple  test 
may  then  be  used  in  that  limited  context.  He  have 
Implemented  three  representative  demonstrations  of 
this  approach  and  believe  that  many  others  are 
possible,  especially  in  remote  sensing  [9].  In  a 
production  environment,  such  monitoring  could  be 
performed  automatically  on  a continuing  basis  as 
new  imagery  arrived. 


B.  System  Integration 

In  this  section  we  discuss  the  integration  of 
the  above  Hawkeye  components  into  a useful  photo 
Interpretation  system. 


1 . System  organization 

The  Hawkeye  system  consists  of  several 
independent  processes  (forks)  which  interact  by 
means  of  inter-process  messages.  Each  process 
performs  a specific  set  of  functions,  either  for 
the  use  of  other  Hawkeye  processes  (e.g.  the 
display  handler)  or  in  the  direct  Interests  of  the 
human  user  (e.g.  a top-level  task  specialist,  such 
as  the  rallyard  monitor) . The  former  processes  are 
"server  processes",  whereas  the  latter  may  be 
classified  as  "user  processes." 

A server  process  is  associated  with  each 
external  connection,  that  is,  device  or  data  base. 
Each  server  presents  a standard  interface  to  the 
rest  of  the  system.  Thus,  knowledge  of  the 
idiosyncrasies  of  a particular  device  or  data  base 
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Is  required  only  within  the  process  dedicated  to 
It. 

The  Hawkeye  systea  currently  consists  of 
the  following  basic  aodules 

* Natural  language  Interface 

- help  facility 

- coooand  Interpreter 

- question  answering 

* Graphics  tablet 

- digitization  of  pictorial 
data  (saps,  photos) 

- graphical  coaaunloatlon 
(pointing,  aenus) 

* Display 

- shared  access  to  display 
by  aultlple  aodules 

- graphical  ccaaunlcatlon 

* Generalized  aap  knowledge  base 

- repository  of  cartographic 
and  cultural  data 

- generic  definitions  of 
seaantio  objects 

- laage  library  and  Index 

- question  answering  about  data  base 

* Map/laage  correspondence 

- deteralnatlon  of  caaera  and 
digitization  paraaeters 

- deteralnatlon  of  transforaatlons 
between  aap  and  laage 

* Task  specialists 

- aensuratlon 

- road  tracer 

- rallyard  acnltorlng 

- harbor  aonltorlng 

Each  aodule  Is  written  in  an  appropriate 
language  (INTBRUSP,  SAIL,  FORTRAN,  or  HACRO)  with 
its  own  data  structures.  The  total  size  of  these 
prograas  exceeded  the  address  space  of  the  host 
aachlne  (KL-10)  several  tlaes  over. 

The  user  coMunlcates  with  the  systea  via 
the  natural  language  Interface  aodule  (written  In 
IMTERLISP)  which  then  calls  upon  appropriate 
'server*  aodules  to  carry  out  his  request.  The 
user  Interface  aodule  is  also  responsible  for 
tasking  and  setting  up  the  systea's  server  aodules. 

The  aap/ laage  correspondence  and  task 
specialist  aodules  were  discussed  above,  under  the 
heading  of  laage  Analysis.  The  following  sections 
describe  the  supporting  aodules  and  the  details  of 
Inter-aoduls  ooMunloatlon. 

2.  Inter-nodule  ndinl  fllTil  111' 

Inter-process  ooaaunlcatlon  Is 

lapleaented  using  the  Inter-Process  Conaunloatlon 
Facility  (IPCF)  of  the  TOPS-20  operating  systea. 
This  facility,  which  only  recently  beoaae 


available,  provides  significantly  better 
Interaction  than  the  pseudo-teletype  and  shared- 
page  facilities  to  which  we  were  Halted  under 
TKHBX.  IPCF  enables  processes  (Including  forks  and 
Jobs)  to  send  and  receive  aessages  In  the  fora  of 
packets,  up  to  a page  (512  words)  In  length. 
Messages  are  copied  fron  the  sender's  address  space 
and  placed  In  an  Input  queue  In  systea  space.  The 
packet  reaalns  In  the  queue  until  the  receiver 
requests  It,  at  which  tlae  It  Is  copied  to  the 
receiver's  address  space. 

Messages  consist  prlaarlly  of  requests 
and  responses  with  the  foraat; 

Source  ID  : Destination  ID  : 

Message  ID  ; Message  Data 

The  source  and  destination  ID'S  are  the 
unique  process  ID'S  assigned  to  the  associated 
processes  at  the  tlae  they  were  created.  The 
aessage  ID  Identifies  the  current  request. 

Once  a request  Is  posted,  the  requester 
(sender)  will  usually  wait  until  the  requestee 
(receiver)  has  finished  processing  the  request. 
The  requestee,  on  the  other  hand.  Is  typically  a 
server  and  operates  by  waiting  for  a request, 
processing  It  to  coapletion,  sending  a response, 
waiting  for  the  next  request,  etc.  Servers  are  not 
Interrupted  by  Incoalng  requests  while  processing  a 
previous  request.  Thus,  we  do  not.  In  general, 
have  a coroutine  structure.  Such  a scenario  arises 
from  the  fact  that  processes  are  necessary  not  so 
much  to  achieve  parallellsa,  but  to  acquire  a 
larger  address  space  and  to  Integrate  aultlple 
languages . 

InterfSce  routines  are  needed  to  match 
the  basic  aessage  passing  nachinery  to  the  various 
prognuaalng  languages  used  In  Inpleaentlng  Hawkeye 
aodules.  For  example,  LISP  Interfacing  routines 
will  accept  arbitrary  objects  (lists,  arrays, 
numbers  etc.)  for  transmission:  One  such  routine 
accepts  any  LISP  S-expresslon,  and  transmits  It  to 
another  LISP  process  that  evaluates  It  and  returns 
the  result. 


3.  KapXgY  ABCXAC. 

The  display  server  In  a multi-process 
systea  has  responsibility  for  allocating  portions 
of  the  display  area  to  requesting  processes.  For 
simplicity  and  modularity  each  process  should 
believe  that  it  has  access  to  Its  own  private 
display  for  presenting  pictures  and  graphics,  and 
for  obtaining  graphic  Input  from  a cursor.  In 
actuality,  processes  are  allocated  windows  (regions 
of  the  physical  display  area)  and  the  display 
server  performs  the  necessary  coordinate 
transformations . 

4.  Qranhlos  tablet  server 

The  tablet  server  has  a similar  role  to 
that  of  the  display  server,  except  that  It  is  used 
only  for  graphical  Input.  The  surfkoe  area  of  the 
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dlgitltliy  tabl*  m*y  contain  uny  dl^ferant 
doouaents,  such  as  a oonvsntlonal  topographic  aap, 
phctographs  of  tbs  area,  and  ccaand  aonus.  The 
tablet  can  ba  used  In  aany  ways.  Landaarks  In  aaps 
or  photos  can  be  annually  Indicated,  and  thus  the 
correspondence  between  the  docuaent  and  the  world 
aodel  established.  Features  can  then  be  traced  for 
input  to  the  aap  data  base,  using  routines  which 
display  the  tracings  and  peralt  real  tlae  editing. 
The  tablet  cursor  can  bo  used  Just  like  the  display 
cursor  for  pointing  in  aensuration  and  question 
answering  tasks. 

5.  Man  data  base  server 

The  data  base  server  is  the  aeans  of 
access  to  the  sap  data  base  for  other  systea 
nodules  which  are  not  required  to  have  detailed 
knowledge  of  its  structure  or  Inpleaentatlon.  This 
server  contains  access  routines  for  answering  a 
variety  of  standard  queries  about  specific  data  and 
the  general  foraat  of  the  data  base:  for  exaaple, 
"what  Is  at  (x,y,z)*,  "what  Is  the  closest  road  to 
(x,y,z}*,  "what  roads  are  contained  In  the  area 
bounded  by  ...",  "where  is  Oakland  Mole",  "what  Is 
the  <attrlbute>  of  <obJoot>",  "what  attributes  does 
<obJect>  have",  and  so  forth. 

The  nap  data  base  contains  three- 
dlnenslonal  descriptions  of  cartographic  and 
cultural  features,  Including  coastlines,  aajor 
roads,  lakes,  bridges,  airfield  runways,  oil 
storage  tanks,  and  harbor  lights.  In  addition,  the 
nap  contains  a partial  taxonony  of  world  entitles, 
with  relevant  general  senantlcs,  Infornatlon  about 
available  Inagery,  and  descriptions  of  data 
structures  used  by  the  system.  The  infornatlon 
about  inagery  Includes  file  nane,  calibration  data, 
and  geographic  area  covered  and  can  be  used  In 
selecting  appropriate  pictures  for  specific  tasks. 
The  descriptions  of  the  data  structures  enable  the 
system  to  construct  autonatlcally  new  entitles  of 
the  correct  structure  for  Inclusion  In  the  data 
base. 

The  map  data  base  Is  a disk-based 
semantic  net  data  structure  that  can  contain 
realistic  quantities  of  data  represented  In  a way 
which  permits  efficient  access.  Entities  are 
represented  by  LISP  atoms  (e.g.  English  words), 
and  information  associated  with  the  entity  is 
stored  in  a property  list  format.  Relationships  to 
other  entitles  are  also  stored  on  the  property 
lists,  thus  establishing  a network  structure  in  the 
data  base.  When  Information  concerning  a 
particular  entity  is  sought,  the  property  list  Is 
retrieved  from  disk  and  established  In  core.  A 
"paging"  scheme  limits  the  amount  of  data  In  core 
(to,  say,  1000  entitles)  and  writes  entitles  back 
out  to  disk.  If  necessary,  the  least  recently  used 
ones  first  [2].  Retrieval  of  the  information  Is 
by  means  of  a hash  table  on  disk,  which  means  that 
access  time  Is  constant  and  independent  of  data 
base  size.  The  geometric  data  are  Indexed  (the 
index  structure  Is  part  of  the  data  base)  via  K-D 
trees  CtO],  one  tree  for  each  class  of  entity 
sought,  to  enable  fast  retrieval  of  Information 


relevant  to  a particular  area.  Wa  are  continually 
refining  representations  for  the  basic  nap  entitles 
In  order  to  Increase  the  richness  of  information 
and  retain  efficiency  of  retrieval. 

We  are  setting  up  a aap  of  the  San 
Francisco  Bay  Area,  containing  major  features, 
coastlines,  bridges,  and  highways.  Figure  18  Is  a 
portion  of  a U.S.  Geological  Survey  (DSGS)  nap  of 
the  area;  Figure  19  shows  the  portion  of  the  aap 
currently  In  the  date  base.  Figure  20  shows  part 
of  the  nap  at  higher  resolution.  The  nap  consists 
of  about  4000  points,  plus  various  semantic 
relationships,  totaling  about  three-quarters  of  a 
million  bytes  of  disk  storage.  (Access  to  a 
particular  Item  of  Information  takes  less  than  a 
millisecond  If  It  Is  paged  in,  and  fifteen  to 
thirty  milliseconds  plus  disc  access  time  If  It  has 
to  be  read  In) . The  nap  Information  is  entered  by 
manually  tracing  features  on  a USGS  nap  using  a 
digitizing  table:  nap  data  in  digital  form  are  not 
available,  and  the  problem  of  digitizing  printed 
naps  has  rather  different  constraints  from  the 
problem  of  making  aaps  from  photographs,  so  we 
could  not  exploit  our  guided  tracing  techniques. 
We  will  soon  bring  up  a terrain  data  base  which 
will  provide  the  elevation  of  any  location  in  the 
nap. 

6.  User  Interface 

When  a system  becomes  large  and  complex, 
ease  of  user  Interaction  Is  essential.  The  user 
Interface  nodule  provides  natural  language 
coMunloatlon  with  Hawkeye.  Capabilities  Include 
querying  the  data  base,  commanding  actions,  such  as 
calibration  of  an  Image,  mensuration,  or 
monitoring,  and  querying  the  system  about  available 
facilities  and  how  to  use  them. 

The  user  Interface  Is  Implemented  with 
LIFER,  a proprietary  language  definition  and 
parsing  system  developed  at  SRI  by  Hendrix  [11]. 
LIFER  uses  an  augmented  transition  net  grammar 
whose  symbols  correspond  to  semantic  as  well  as 
syntactic  entitles.  LIFER  makes  It  particularly 
easy  to  achieve  robust  dialogs  about  a limited 
domain,  facilitated  by  such  features  as  acceptance 
of  elliptical  Input  and  the  ability  to  expend  the 
grammar  Incrementally  In  English  as  deficiencies 
are  discovered. 

LIFER  Interfaces  have  been  designed  for 
several  large  AI  programs  Including  the  ACCAT  test- 
bed supported  by  ARPA  [12].  A unique  requirement 
arising  with  pictorial  data  Is  the  need  for 
graphical  comaunloatlon,  such  as  pointing  with  a 
cursor  In  an  image.  In  oonjvaictlon  with  natural 
language  commands.  For  example,  "What  Is  this  ?" 
or  "What  Is  the  distance  between  here  and  here  ?". 
The  LIFER  grammar  has  been  written  to  parse  such 
expressions,  requesting  coordinates  from  the 
servers  providing  graphical  Input. 

The  natural  language  Interfhoe  permits 
tasking  via  requests  In  English.  Hawkeye  will 
notify  the  user  when  requests  arrive,  printing  them 
if  he  desires.  The  user  can  ask  the  system  to  read 


IIS 


r 


;l 


the  requests,  parse  and  execute  them  if  It  can. 
The  system  will  alert  the  user  If  it  cannot 
understand  or  carry  out  a request,  so  that  It  may 
be  fulfilled  Interactively.  It  will  also  notify 
him  when  It  has  finished  all  outstanding  tasks. 


IV  NEW  DIRECTIONS:  ROAD  MONITORING 

Hawkeye  demonstrated  the  feasibility  of  using 
knowledge  about  maps  and  imaging  to  automate  a 
variety  of  representative  photo  interpretation 
tasks.  With  this  knowledge,  adequate  performance 
was  achieved  In  straightforward  cases,  but  the 
system  was  easily  misled  by  contingencies  that  It 
did  not  know  about,  for  example,  clouds.  In  order 
to  approach  human  performance,  substantially  more 
world  knowledge  Is  necessary.  In  the  next  stage  of 
our  research,  we  plan  to  develop  a system  with 
considerable  expertise  in  a specialized  task  area. 

The  task  we  have  selected  Is  that  of 
monitoring  traffic  on  roads.  More  specifically, 
given  a sequence  of  reconnaissance  loages  of  a 
region  under  surveillance,  possibly  taken  under 
adverse  viewing  conditions,  the  system  will  first 
locate  sections  of  known  roads  visible  in  the 
Images,  locate  anomalous  regions  on  the  roads  whose 
size,  shape,  velocity  and  other  characteristics  are 
consistent  with  those  of  vehicles,  and  then  perform 
a detailed  scene  analysis  In  the  vicinity  of  the 
anomalies  In  order  to  Identify  specific  vehicle 
types. 

The  road  monitoring  system  Is  being  organized 
as  two  expert  subsystems  designated  the  "Road 
Expert"  and  the  "Vehicle  Expert".  The  Road  Expert 
will  have  the  task  of  analysing  Imagery  at  low 
resolution  to: 

■ Establish  Image  map  ccrrespondance  (Camera 
Calibration  Algorithm). 

* Locate  In  the  Imagery  the  visible  segments 
of  roads  to  be  monitored  (Low  Resolution 
Road  Detector). 

* Accurately  mark  the  road  boundaries  and 
determine  their  map  coordinates 
(Intermediate  Resolution  Road  Detector). 

* Search  for  anomalies  within  the  marked  road 
boundaries  (Road  Anomaly  Detector). 

* Confirm  potential  vehicle  exlstance  by 
comparison  with  previously  acquired 
Imagery,  and  with  general  knowledge  of 
vehicle  behavior  using  data  base  support 
(Symbolic  Reasoning  Nodule). 

The  "Vehicle  Expert*  will  examine  at  high 
resolution  the  potentially  Interesting  objects 
found  by  the  road  expert.  It  will  have  the  task 
of: 

* Producing  a description  of  the  3- 
dlmensional  geometry  of  the  road  objects. 
(Oaometrlo  Description  Module). 


* Comparison  of  observed  object  descriptions 
with  stored  descriptions  of  vehicle  types 
(Vehicle  Identification  Nodule). 

In  order  to  attain  the  level  of  performance 
for  which  we  aim,  the  system  will  require  knowledge 
of  a wide  variety  of  situations  and  events,  such  as 
obscuration  of  roads  by  trees  or  clouds,  the  visual 
effects  of  snow  and  rain,  the  behavior  of  roads  at 
Intersections,  mountains,  tunnels  and  so  forth. 
The  system  will  also  require  knowledge  of  Its 
repertoire  of  resources,  their  abilities  and 
limitations,  and  how  to  evaluate  Its  own 
performance.  The  Hawkeye  system  framework  provides 
a suitable  foundation  for  Integrating  all  the 
capabilities  and  knowledge  Into  a unified  system. 
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Figure  3.  Correlation  matching  of  an  image  chip 


Figure  5-  Landnarks  located  in  the  sensed  Image 


Figure  6.  An  oblique  sensed  Image 
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Figure  7.  Hatching  vertical  and  oblique  views 


Figure  8.  The  aap  projected  onto  two  pictures 
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Figure  9.  Indicating  corresponding  |X)lnts 
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Figure  13.  The  road  after  autoiaatlc  tracing 


r predicted  rrom  the  map 


Figure  15.  Harbor  pier 


Figure  16.  Berthed  ships  detected 
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Figure  17-  Be^^he^i  ships  In  an  oblique  image 


Figure  18.  A USOS  map  of  San  Francisco  Bay 
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Spilial  Undersundinj'  Overview 
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The  objectives  of  this  research  are  to  interpret  image  sequences 
in  terms  of  spatial  features  and  spatial  relations,  and  to  use 
shape  knowMge  in  low-level  and  high-level  vision.  The 
research  is  directed  to  applications  In  stereo  photointerpretation, 
stereo  change  detection,  and  terminal  guidance  for  strategic  and 
tactical  devices.  The  goal  is  to  develop  techniques  for  passive 
ranging  and  interpretation  which  can  be  used  in  guidance  and 
monitoring  despite  differences  in  sensors,  viewpoint,  sun  angle 
and  weather  or  surface  conditions  such  as  rain  or  snow. 

Collaboration 

We  now  have  a coUaborative  arTangcment  with  the  Signal 
Processing  Laboratory  of  Lockheed  Missiles  and  Space  Co. 
Support  is  being  sought  for  a joint  proposal  involving  Stanford 
University,  Lockheed,  and  SRI.  In  this  arrangement.  Lockheed 
would  provide  system  integration  and  implementation,  along 
with  input  concerning  military  relevance. 


Representation 

We  have  begun  work  on  representing  commercial  aircraft  for 
use  in  interpretation  of  airfield  Imagery.  A new  program  for 
representing  complex  objects  and  displaying  symbolic  structures 
in  their  appearances  is  being  designed. 

Depth  Mapping 

An  earlier  level  program  based  on  obtaining  range 
measuremenu  at  a coarse  sample  of  points  was  deKribed  in 
previous  proceedings  of  the  Image  Understanding  workshops. 
This  sample  was  dense  enough  to  obtain  ground  surface 
descriptions.  A program  which  makes  more  dense  and  uniform 
range  mapping  is  described  in  detail  in  the  paper  by  Cennery 
in  these  proceedings.  Application  It  shown  to  an  aerial  view  of  a 
parking  lot  scene,  a ground  level  parking  lot  scene,  and  an  aerial 
view  of  an  apartment  building.  In  all  three  scenes.  It  locates  the 
ground  surface  and  finds  areu  above  the  ground  surface,  with 
some  mistakes.  In  the  apartment  house  scene.  It  picks  out  the 
roof  surface  also. 

In  the  near  future,  the  depth  mapping  program  will  be  used  to 
produce  maps  of  the  Pittsburgh  scene  of  CMU,  and  portions  of 
passenger  terminals  at  San  Francisco  airpoa  To  deal  with  large 
Images,  Initially  they  will  be  done  In  smaller  pieces.  We  expect  to 
make  a version  of  the  program  which  proceeds  row  by  row  to 
cut  down  the  amount  of  storage  necessary  which  will  allow 
processing  entire  pictures  of  the  slu  of  the  Pittsburgh  series. 


A high  resolution  range  mapping  program  will  follow  which 
determines  object  boundaries  to  higher  resolution  and  which 
can  discriminate  small  regions. 

Interactive  Vision 

A new  direction  of  research  has  begun  whose  goal  is  a 
Mycin-llke  system  in  which  Image  understanding  programs  can 
be  built  by  non-expert  users.  It  is  intended  to  generate  programs 
to  find  airfields  and  oil  tanks,  in  a way  that  extends  in  a 
natural  way  to  a much  larger  class  of  image  understanding 
tasks.  The  system  depends  upon  a program  which  draws 
conclusions  from  the  representation  of  shape  of  objecu. 


Database 

We  have  extended  our  data  base  to  include  a mapping  sequence 
of  aerial  photographs  of  San  Francisco  airport  suitable  for 
stereo.  We  are  negotiating  for  use  of  another  mapping  sequence 
to  use  for  stereo  change  detection.  We  have  obtained  a stereo 
pair  of  urban  areas  in  Pittsburgh  from  Carnegie  Mellon 
University.  We  have  arrangements  to  get  digitized  data  from  a 
SAR  flight,  along  with  photographs  which  cover  the  same  area. 
We  are  grateful  to  CMU,  USC,  Lockheed.  Hughes  and  the 
engineering  department  of  the  San  Francisco  International 
Airport  for  valuable  assistance  in  obtaining  and  digitizing 
Imagery. 
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OVERVIEW  OF  THE  ROCHESTER  IMAGE  UNDERSTANDING  PROJECT 
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1.  Basic  Theory 

Our  basic  approach  to  vision  was  set  out  in 
[Ballard  and  Brown,  1977].  The  query-driven,  top- 
down  approach  to  vision  is  especially  well  suited 
to  application  as  opposed  to  purely  fundamental, 
non-applied  research. 

The  achievement  here  is  to  carve  out  a use- 
ful subset  of  ideas  and  techniques  from  the  many 
proposed  for  vision,  and  to  organize  them  in  a 
flexible  and  extensible  way  in  the  service  of 
particular  domains. 

Our  three-layered  structure  (image  data, 
sketchmap,  and  model)  and  associated  constructs 
allow  the  use  of  previously  acquired  results  and 
the  increasing  automation  of  image-understanding 
decisions.  Our  applications  programs  use  these 
basic  ideas.  In  writing  the  applications  programs 
(involving  aerial  and  biomedical  images)  we  have 
improved  and  tested  our  ideas  on  particular  vision 
mechanisms  (executive  procedures,  constraint  net- 
works, location  descriptors,  etc.  [Ballard,  Brown 
and  Feldman,  1977]),  data  structures,  and  control 
of  vision  processing  (see  Section  2). 


2.  Application  to  Ship  Finding 

In  this  simple  application  of  the  system, 
docked  ships  are  located  in  harbor  scenes  [Brown 
and  Lantz , 1977].  The  system,  under  direction  of 
the  user-written  query,  begins  by  deciding  where 
to  look  by  satisfying  a constraint  network;  the 
more  information  provided,  the  narrower  the  focus 
of  attention.  Recent  work  at  SRI  [Barrow,  1977] 
has  shown  that  map  data  may  be  automatically 
registered  with  images  such  as  ours  to  within 
better  than  a pixel,  so  we  felt  comfortable  about 
bypassing  the  registration  problem  in  this  study. 
Were  the  registration  uncertain,  the  constraints 
would  produce  a more  fuzzy  area  to  search  than 
they  did. 

This  application  has  provided  a test  bed  for 
the  constraint  network  idea,  as  well  as  for  the 
representation  of  subsets  of  2-space  (linear  and 
area  objects)  and  for  set-theoretic  operations  on 
them  (union  and  intersection). 


3.  Display  of  3-D  Data 

In  many  applications  it  is  useful  to  charac- 
terize the  orientational  properties  of  3-D 
structures.  For  example,  the  surface  normals  of 
mountainous  areas  are  more  varied  in  direction 
than  those  of  plains  areas.  Two  descriptions  of 
3-D  orientation  for  general  3-D  wire-frame 
structures  were  designed  [Brown,  1976a;  Brown, 
1976b;  Brown,  1977].  The  descriptions  provide  a 
generally  useful  tool  for  display  of  certain  3-D 
volumes,  especially  well  suited  for  raster 
graphics  [Brown  and  Selfridge,  in  preparation]. 
Some  basic  mathematical/statistical  questions 
raised  here  were  answered  by  J.  Wellner  in  our 
Statistics  Department  [Wellner,  to  appear],  and 
he  and  his  students  are  presently  applying  the 
theoretical  results  to  provide  practical  statisti- 
cal tests  for  3-D  vector  data  [Wellner,  In 
preparation]. 


4.  Component  Building 

4.1  Hardware 

A second  Eclipse  computer  was  purchased  for 
use  by  the  Vision  Laboratory.  It  has  its  own 
ETHERNET  board  {i\%o  acquired),  and  will  control 
a Grinnell  high-resolution  color  raster  display 
device  (also  acquired,  but  not  yet  in  house). 

For  all  this,  and  to  free  up  the  RIG  system,  new 
memory  boards  were  ourchased  and  their  controller 
designed,  built,  and  debugged.  Acquisition  of  an 
image  input  device  is  oroceeding;  upgrading  low- 
latency  mass  storage  for  images  is  also  proceed- 
ing. All  of  the  above  were  acquired  with  non- 
DARPA  funds.  The  disk  unit  budgeted  for  this 
period  has  been  ordered,  but  has  not  yet  arrived. 

4.2  Software 

Basic  software  support  for  the  system  has 
been  operational  for  six  months  and  is  improving 
daily  (see  Section  7).  Communications  programs 
and  protocols  are  under  development  for  distribu- 
ted computing.  Controlling  programs  for  the 
Grinnell  display  are  being  written.  SAIL  code  has 
been  written  for  the  basic  data  structures  in  the 
vision  system  and  for  operations  on  them.  SAIL 
code  has  been  obtained  from  Carneqie-Mellon 
University  (their  entire  vision  laboratory  pack- 
age), Stanford,  and  USC  (e.g.,  the  Heuckel 
operator),  and  made  to  run  on  our  system.  Many 
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SAIL  utilities  for  Image  management,  transforma- 
tion, and  transmission  have  been  written.  A 
general  Header  for  use  In  transmission  of  Images 
In  distributed  computing  environments  has  been 
designed  and  Is  In  use.  An  Image  protocol  for  use 
In  distributed  computing  environments  has  been 
designed  [Maleson,  Rashid  and  Nablelsky,  1977],  to 
allow  Interprocess  conmunl cation  of  Image  data 
(see  Section  8). 


5.  Texture  Understanding 

Standard  texture  analysis  techniques  rely  on 
the  calculation  of  a set  of  features  (like  edge 
probability  per  unit  area,  or  local  neighborhood 
co-occurrence  probability  matrices)  on  training 
sets  of  images,  taking  statistical  measures  of 
these  features  for  each  training  set  (mean,  stand- 
ard deviation,  entropy,  etc.),  and  partitioning 
the  feature  hyper-space  so  that  each  partition 
contains  exactly  one  training  set.  Unknown 
texture  patches  are  now  measured  by  the  same 
feature  operators  to  determine  their  location  in 
feature  hyper-space,  and  are  assigned  the  texture 
class  of  the  appropriate  partition.  This  tech- 
nique works  well  for  limited  domains,  where  an 
accurate  training  set  can  be  chosen,  and  where 
textures  exhibit  variation  in  the  local  features 
measured.  Rotations  and  scale  changes  result  in 
a new  texture  class  assignment. 

We  are  approaching  the  texture  problem  by 
dividing  texture  regions  into  meaningful  sub- 
elements of  similar  intensity  sample  points,  then 
using  rotation-  and  scale-invariant  shape  measures 
to  characterize  these  regions,  and  finally  deter- 
mining spatial  relationships  among  our  sub- 
elements. By  using  a decision-tree  program 
structure,  easily  discriminated  textures  are 
separated  quickly,  and  more  complex  textural 
structure  is  only  extracted  when  necessary.  This 
texture  analysis  scheme  not  only  classifies  tex- 
ture patches  into  sets,  but  also  produces  a 
description  of  similarities  and  differences  among 
different  patches.  That  information  is  then 
available  to  higher-level  semantically-driven 
processes,  and  is  more  useful  than  a binary  same/ 
different  decision.  In  this  period,  we  have 
completed  a prototype  texture  analysis  system  that 
demonstrates  the  feasibility  of  this  approach 
[Maleson,  1977]. 


6.  Semantic  Nets,  Frames,  and  Associations 

A knowledge  representation  system  was  devel- 
oped which  is  based  on  the  use  of  a semantic  net 
on  which  a higher-level  structure  of  frames  has 
been  superimposed.  The  system  was  designed  for 
use  with  a natural  language  system  which  is  espe- 
cially concerned  with  finding  the  correct  senses 
of  ambiguous  words  in  context.  An  examination  of 
several  linguistic  examples  shows  how  the  repre- 
sentation system  facilitates  associative  searches 
of  context  for  potentially  appropriate  senses  of 
ambiguous  words,  and  how  the  results  of  such 
searches  can  often  provide  definite  referents 
[Hayes,  1976;  Hayes,  1977a;  Hayes,  1977b].  The 


applications  of  this  model  to  the  Image  under- 
standing world  modelling  problem  are  being 
explored. 


7,  Support  System  Development 

The  major  accomplishment  of  this  period  was 
the  bringing  of  the  RIG  system  into  full  production 
use.  The  RIG  system  consists  of  4 64KH  minicompu- 
ters (intended  primarily  for  stand-alone  use  and 
possessing  local  disk  storage  and  high  resolution 
raster  displays)  connected  In  a 3 MHz  ring  network 
to  a Data  General  Eclipse.  The  Eclipse  maintains 
a modestly  large  local  file  capacity  (-100  MB), 
hard  copy  printing  and  plotting,  and  magnetic  tape. 
It  also  provides  editing  and  other  facilities  to 
a number  of  local  terminals,  and  serves  as  a gate- 
way between  larger  campus  machines  (360/65  and 
KL  10),  the  ARPANET  (as  a VDH),  and  our  local 
network.  We  have  also  obtained  funding  from  the 
National  Science  Foundation  for  an  Image  Under- 
standing Laboratory,  and  will  add  it  to  RIB  in  the 
coming  year  [Ball  et  al .,  1976;  FeliliBn  & Rashid,  77]. 


8.  Image  Protocol  Development 

The  Rochester  Image  Protocol  is  being  devel- 
oped within  the  RIG  framework  and  governs  communi- 
cation between  image  handling  processes  in  our 
network.  It  is  built  around  the  concept  of  a 
structured  image  definition  similar  in  spirit  to 
the  structured  graphics  display  files  of  [Soroull 
and  Thomas,  1974].  This  image  data  structure 
serves  both  as  a common  language  for  describing 
images  and  as  a uniform  way  of  specifying  the  dis- 
play of  image  data  on  various  raster  devices  (e.g., 
plotting  devices,  black  and  white  and  color 
variable  intensity  and  simple  intensity  displays). 


9.  Progranwing  Language  Development 

We  developed  and  made  available  to  the 
comnunity  important  corrections  and  improvements 
to  the  compiler  for  SAIL,  the  most  widely  used 
language  in  the  DARPA  image  understanding  effort. 
We  are  continuing  to  maintain  and  improve  the  SAIL 
language  [Rashid,  76]  while  working  on  anew  system. 

Some  fundamental  properties  of  distributed 
computing  (DC)  do  not  occur  in  conventional 
progratimi ng  and  these  properties  lead  In  a natural 
way  to  programming  language  constructs.  The  most 
obvious  property  is  that  a distributed  computation 
is  spread  among  several  computers  which  are 
assumed  to  be  connected  by  some  communication 
paths.  For  the  forseeable  future,  these  communica- 
tion paths  will  be  less  reliable  and  have  lower 
bandwidth  than  is  available  in  the  processors 
themselves.  This  leads  us  to  expect  that  DC  pro- 
grams will  be  made  up  largely  of  self-contained 
modules  which  will  share  very  little  information 
directly.  Dne  would  also  want  to  have  the  coninuni- 
cation  between  modules  be  some  asynchronous  mes- 
sage protocol  rather  than  subroutine  or  coroutine 
calls  where  one  module  would  always  have  to  wait 
for  a response  from  the  other.  It  appears  to  us 
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[ that  the  module-message  paradigm  is  inherently 

' well  suited  to  DC  and  is  likely  to  appear  in  some 

form  in  any  proposed  high-level  language  for  DC. 

Starting  from  the  basic  module-message  para- 
digm, we  have  been  attempting  to  develop  a new 
generation  of  high-level  programming  languages 
which  would  incorporate  as  much  as  possible  of 
the  computer  science  of  the  last  decade.  This 
overly  ambitious  project  is  called  PUTS  (Program- 
ming Language  in  the  Sky).  In  addition  to  DC, 
the  PUTS  effort  is  attempting  to  encompass  our 
current  knowledge  of  software  reliability, 
language  extensibility,  and  automatic  programming. 
Current  efforts  include  the  construction  and  use 
of  an  interim  PLITS  and  a number  of  basic  studies 
on  particular  issues.  The  most  advanced  is  a 
careful  definition  of  the  PLITS-DC  proposals, 
expressed  as  a gedanken  extension  to  PASCAL.  Other 
issues  are  addressed  in  [Feldman,  1976]  and  forth- 
coming papers. 
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OVERVIEW 

The  objective  of  our  research  Is  to  achieve 
better  understanding  of  Image  structure  and  to 
Improve  the  capability  of  Image  processing  sys* 
terns  to  extract  Information  from  Imagery  and  to 
convey  that  Information  In  a useful  form.  The 
results  of  this  research  are  expected  to  pro- 
vide the  basis  for  technology  development  rel- 
ative to  military  applications  of  machine  ex- 
traction of  Information  from  aircraft  and  sat- 
ellite imagery. 

A block  diagram  of  an  Image  Understanding 
System  Is  shown  In  Fig.  I.  We  first  consider 
the  left  side  of  the  block  diagram.  After  the 
sensor  collects  the  Image  data,  the  preprocessor 
may  either  compress  It  for  storage  or  transmis- 
sion or  It  may  attempt  to  put  the  data  Into  a 
form  more  suitable  for  analysis.  Image  segmen- 
tation may  simply  Involve  locating  objects  In 
the  Image  or,  for  complex  scenes,  determination 
of  characteristically  different  regions  may  be 
required.  Each  of  the  objects  or  regions  Is 
categorized  by  the  classifier  which  may  use 
either  classical  decision-theoretic  methods  or 
some  of  the  more  recently  developed  syntactic 
methods.  In  linguistic  terminology,  the  regions 
(objects)  are  primitives,  and  the  ciassifler 
finds  attributes  for  these  primitives.  Finally, 
the  structural  analyzer  attempts  to  determine  the 
spatial,  spectral,  and/or  temporal  relationships 
among  the  classified  primitives.  The  output  of 
the  "Structure  Analysis"  block  will  be  a descrip- 
tion (qualitative  as  well  as  quantitative)  of  the 
original  scene.  Notice  that  the  various  blocks 
In  the  system  are  highly  Interactive.  Usually, 

In  analyzing  a scene  one  has  to  go  back  and  forth 
through  the  system  several  times. 

Past  research  In  image  understanding  and  re- 
lated areas  at  both  Purdue  and  elsewhere  has  In- 
dicated that  scene  analysis  can  be  successful  only 
if  we  restrict  a priori  the  class  of  scenes  we 
are  analyzing.  This  is  reflected  In  the  right 
side  of  the  block  diagram  In  Fig.  I.  A world 
model  Is  postulated  for  the  class  of  scenes  at 
hand.  This  model  is  then  used  to  guide  each 
stage  of  the  analyzing  system.  The  results  of 
each  processing  stage  can  be  used  In  turn  to  re- 
fine the  world  model. 

Research  In  image  understanding  at  Purdue 
concerns  with  all  aspects  of  the  block  diagram  In 
Fig.  I.  However,  the  emphasis  will  He  In  the 
Interaction  between  the  processing  stages  (left 
side  of  Fig.  I);  and  In  the  searching  for  suitable 


types  of  world  models.  One  type  of  world  model 
we  are  looking  Into  combines  the  Ideas  of  hier- 
archical relational  graphs  and  the  linguistic 
approach. 

SUMMARY  OF  RESEARCH  PROGRESS 

We  summarize  the  recant  progress  of  some  of 
our  research  projects. 

(A)  A Syntactic  App^ch  to  l^ey 
standing  - Janmin  Rang  and  K.  S.  Fu 

A Syntax-DI rected  Method  has  been  Investiga- 
ted and  developed  for  the  land-use  classification 
of  satellite  Images.  In  particular,  the  highway, 
river,  bridge,  and  commerclal/Industrlal  recogni- 
tion have  been  successfully  achieved  In  a fully 
automated  level.  Image  segmentation  and  object 
detection  have  also  been  studied.  A syntactic 
method  that  utilizes  the  texture  measurements  and 
tree  grammar  analysis  has  been  devised  and  tested 
on  different  Images,  such  as  LANDSAT  and  aero- 
photograph  I c Images. 

(B)  Image  Segmentation  Using  Texture  and 
Grey  Level  -S.t.  Carlton  and  (!).R.  MItchel  I 

The  research  effort  underway  concerns  the 
application  of  textural  features  to  the  Image  seg- 
mentation problem.  The  segmentation  technique 
uses  a texture  measure  that  counts  the  number  of 
local  extrema  In  a window  centered  at  each  picture 
point.  Four  grey  level  pictures  are  derived,  each 
of  which  represents  a texture  or  grey  level  pro- 
perty of  the  original  Image.  These  intermediate 
pictures  may  be  viewed  as  a b-dlmenslonal  Image 
in  vdiich  each  point  consists  of  a A-dlmenslonal 
vector.  These  vectors  are  then  clustered  Into 
different  groups  and  averaged  to  form  vectors  rep- 
resentative of  each  group.  The  segmentation  Is 
completed  by  assigning  each  pixel  In  the  original 
Image  to  one  of  the  groups  defined  by  the  repre- 
sentative vectors  using  a distance  criterion. 

This  process  may  be  structured  hierarchically  by 
repetitively  utilizing  diminishing  window  sizes. 

(C)  Random  Field  tonroach  to  Contextual 
Pattern  ciassif Icatlon  - K.i.  ^u  and 

TTTTVS 

We  are  concerned  with  the  problem  of  using 
contextual  Information  for  classification  of  re- 
motely-sensed multlspectral  data.  The  problem  Is 
Interesting  because  the  occurrences  of  data 
vectors  at  resolution  elements  tend  to  be  correla- 
ted. First  a torus  process  Is  consistently  de- 
fined on  a rectangular  torus.  The  second  order 
moments  end  spectral  density  function  are  speci- 
fied. The  torus  process  Is  then  extended  to  form 
a spatial  process  defined  on  the  whole  Infinite 
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plan*.  Th*  classification  algorithm  Is  to  employ 
the  Bayesian  strategy  with  the  pattern  to  be 
classified  and  Its  neighbors  to  provide  the  op- 
tliMl  decision.  Experimental  results  on  LANDSAT 
data  have  demonstrated  the  Improvement  In  classi- 
fication accuracy  when  context  Information  Is 
used. 

(0)  Fourier  Descriptors  for  Extraction  of 
Shape  Information  - T.  Wallace  and 
P,  A.  Uinti 

We  have  extended  the  work  of  Granlund  and 
Persoon  and  Fu  on  Fourier  descriptors  (FD)  In 
several  ways.  Our  present  algorithm  computes 
FDs  very  efficiently  using  the  FFT,  the  normal- 
ization procedure  Is  straightforward  and  much 
more  efficient  than  the  root-finding  technique  of 
Persoon  and  Fu,  and  we  preserve  all  of  the  shape 
Information  In  the  original  data.  This  shape 
recognition  algorithm  Is  powerful  and  well-tested 
but  It  requires  detection  of  the  boundaries  of 
objects  In  photographic  data  before  It  can  be 
applied.  We  have  worked  with  the  BLOB  boundary- 
finding algorithm  of  Gupta  and  WIntz,  modifying 
BLOB  so  that  it  Is  more  effective  with  ordinary 
photographic  data  rather  than  the  multl-spectral 
data  It  was  originally  intended  to  process. 

(BLOB  locates  regions  of  similar  mean  and  var- 
iance In  the  data.)  We  recently  processed  a 
photograph  containing  two  airplanes  with  BLOB  and 
then  examined  th*  shapes  of  all  contours  found 
with  lengths  50  to  102k  with  the  FD  algorithm. 

Tne  two  airplanes  were  correctly  identified  by 
the  program,  and  there  were  no  false  classifica- 
tions. Future  plans  Include  extending  the  FD 
algorithm  to  recognize  three-dimensional  objects, 
and  improving  present  boundary  finding  techniques. 

(E)  Filtering  to  Bemove  Cloud  Cover  In 
SatelHte  Imagery  - 0.  R.  HItchell  and 

t.  J."B^'p.1TI  ^ 

Wt  are  using  homomorphic  filtering  techniques 
to  remove  multiplicative  noise  effects  such  as 
cloud  cover  and  atmospheric  turbulence  In  ERTS 
Imagery.  Our  present  approach  Is  to  estimate  the 
cloud  statistics  directly  from  th*  cloudy  pictures 
using  a cloud  distortion  model.  Once  the  noise 
(cloud)  power  spectrum  Is  obtained,  an  optimum 
filter  Is  derived  to  separate  th*  signal  and 
noise.  The  filtering  Implemented  Is  a three  di- 
mensional computerized  filter.  The  three  dimen- 
sions correspond  to  two  spatial  dimensions  and 
one  spectral  dimension.  The  distortion  model  de- 
veloped for  the  Image  Includes  cloud  reflection 
and  transmission  and  atmospheric  turbulence. 

(F)  Characterization  of  Context  In  Imegerv 
by  Two-blmenslonal  iUndom  Processes  - 
P.  H.  ^waln  and  E.  F.  Kit 

Classification  analysts  of  multlspectral 
Image  data  Is  routinely  carried  out  by  classify- 
ing a single  pixel  at  a time,  extracting  InforsM- 
tlon  from  the  spectral  doewln.  Ignoring  th*  two- 
dimensional  or  Image  character  of  th*  data. 

Recent  studies  confirm  that  there  Is  useful  In- 
formation In  the  context  of  a pixel  ( e.g. , Its 
neighbors)  which  can  be  helpful  In  Identifying  th* 
pixel.  Utilization  of  context,  th*  Information 
contained  In  the  spatial  dependencies  among  lamg* 
points,  Is  an  Important  step  on  th*  way  to 


achieving  "Image-understanding".  In  this  research 
the  scene  Is  considered  to  be  a multl-dimenslonal 
random  process  characterlzable  in  terms  of  Its 
statistical  transition  properties.  Implementation 
of  classification  rules  utilizing  these  properties 
without  being  prohibitively  expensive  In  terms  of 
computational  requirements  represents  a consider- 
able challenge. 

PUBLICATIONS 

1.  J.  Keng,  "A  Syntactic  Method  for  Image  Seg- 
mentation ,"  ProceedInjs_of_Seveni£h_Annu*l_ 
Symposium  of  Automatic  I may  ry  Pattern  Recog- 
nition. Electronic  Industrial  Association, 
toi lege  Park,  Maryland,  May  23-2k,  1977. 

2.  T.  S.  Yu  and  K.  S.  Fu,  "Contextual  Pattern 
Classification  for  Remotely  Sensed  Multi- 
spectral  Data,"  Proceedings  of  the  Eighth 
Modeling  and  Simulation  Cmference.  Pittsburgh, 
P*.,  April  1977. 

3.  0.  R.  HItchell,  C.  R.  Myers,  and  W.  Boyne, 

"A  Hax-Mln  Measure  for  Image  Texture  Analysis," 
IEEE  Trans,  on  Computers.  May  1977. 

k.  S.  G.  Carlton  and  0.  R.  Mitchell,  "Image  Seg- 
mentation Using  Texture  and  Grey  Level," 

Proc.  at  the  IEEE  Computer  ^clety  Conference 
on  Pattern  Recognition  and  Image  Processing. 
June  6-8,  197^*. 


Fig.  I An  Image  Understanding  System 


IS* 


I 


AUTOMATIC  IMAGE  RECOGNITION  SYSTEM 
Program  Status.  September  1977 


R.  Larson 

HONEYWELL  INC. 

Systems  and  Research  Center 
Minneapolis.  Minnesota  53413 

This  program  is  entering  the  second  phase.  Previous  work  has  shown  that  one  cannot  use  the 

where  the  effort  will  be  on  simulating  an  airborne  same  classifier  to  recognize  both  small  images 


tactical  recognition  system.  The  Autothreshold 
hardware  development  has  been  completed  and 
we  have  begun  working  on  the  system  structure, 
the  data  base  and  some  of  the  component  sub- 
routines. This  status  report  is  a summary  of 
these  initial  steps. 

AUTOTHRESHOLD  HARDWARE 

We  have  tested  the  Autothreshold  ability 
to  adapt  to  changing  levels  of  contrast  and  inten- 
sity by  using  it  to  extract  target  images  from  one 
hour  of  TV  recordings  of  airborne  FLIR  imagery. 
(The  test  imagery  was  from  the  Kreb's  data  set. ) 

In  the  test  data  the  target  contrast  varied  from 
approximately  20  per  cent  to  near  lOO  per  cent 
and  the  average  background  intensity  (single 
frame)  varied  from  0.  25  volts  to  0.  75  volts 
(relative  to  a 1 volt  maximum).  Feature  extract- 
ion. for  the  purpose  of  detecting  man  made  objects, 
appears  consistent  over  this  range  of  conditions 
and  we  expect  that  the  scene  adaptation  will  result 
in  smaller  class  variances  and  therefor  in  im- 
proved detection  accuracy.  The  man  made  object 
classifier  is  undergoing  real  time  evaluation  at 
this  time  and  the  results  will  be  reported  at  the 
workshop.  ("Adaptive  Threshold  for  an  Image 
Recognition  System") 

IMAGE  RECOGNITION  SYSTEM  SIMULATION 

The  diagram  shows  the  system  structure 
that  Honeywell  and  Purdue  have  chosen  as  the 
simulation  goal.  The  left  and  right  sides  of  the 
diagram  are  concerned  with  background  and 
targets  respectively.  The  purpose  of  the  back- 
ground classifier  is  to  establish  a local  context 
for  detected  potential  targets,  based  on  both  a 
prior  and  current  information.  The  target 
analysis  begins  with  man  made  object  detection 
based  on  segmentation  of  the  image  and  prelim- 
inary ciassification  of  the  segments.  From  the 
current  Autothreshold  results,  we  feel  that  the 
Autoscreener  will  be  able  to  perform  this  function. 


and  large  images.  Small  images  can  be  class- 
ified quite  well  by  statistical  pattern  recognition 
methods  while  large  images  (images  that  are 
large  enough  to  show  object  structure)  do  not 
allow  good  statistical  description.  Therefore, 
we  follow  the  MMO  detection  by  a second  screen- 
ing that  sorts  the  detected  objects  by  the  size  of 
the  images.  This  screening  step  will  also  esti- 
mate the  actual  dimensions  of  the  object  and 
reject  objects  that  are  the  wrong  size  for  the 
mission  targets.  Classification  then  proceeds 
as  shown  with  small  images  being  classified 
statistically  and  large  images  being  classified 
structurally. 

We  have  started  work  on  the  secondary 
screening  algorithm  using  the  Autoscreener  to 
extract  objects  and  provide  image  size  measure- 
ments. Sorting  by  image  size  is  a trivial  task, 
but  there  are  some  difficulties  to  be  overcome  in 
estimating  the  true  dimensions  of  the  object 
when  the  object  aspect  angle  is  unknown.  There 
is  an  additional  problem  in  working  with  our  data 
base  in  that  altitude  and  viewing  angles,  needed 
to  determine  range,  were  not  recorded  and  must 
be  estimated.  We  will  implement  the  algorithm 
as  a software  modification  to  the  Autoscreener  for 
on  line  testing. 

We  have  made  some  changes  to  the  inference 
process  used  in  the  Prototype  Similarity  segment- 
ation method,  but  the  work  has  not  proceeded  far 
enough  to  report  at  this  time. 

DATA  BASE  CONSIDERATIONS 

The  purpose  of  this  simulation  effort  is  to 
investigate  the  application  of  image  understanding 
technology  to  a variety  of  military  tactical  airborne 
missions.  Thus  there  are  a number  of  requirements 
that  the  experimental  data  base  must  meet.  In 
addition  to  the  constraints  on  target  objects,  back- 
ground types,  observing  platform  and  sensor  char- 
acteristics, there  is  the  fact  that  the  sensor  data 
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available  during  a tactical  operation  will  be  a 
continuing  sequence  of  images  that  are  highly 
correlated.  To  include  the  information  gain 
available  from  sequential  processing  it  is  de- 
sirable for  the  data  base  to  be  sequential. 


The  Kreb's  data  set  is  an  appropiate, 
realistic  collection  of  imagery,  but  it  does  not 
include  imagery  from  the  tactical  FLIR's  and  the 
low  resolution  arrays.  Honeywell  has  initiated 
an  effort  to  obtain  imagery  from  these  types  of 
sensors  and  we  would  be  interested  in  talking 
with  other  DARPA  contractors  with  similar 
desires. 
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IMAGE  UNDERSTANDING  RESEARCH  AT  CMU: 

A ProgrMs  Raport 

Raj  Rwldy 

Dapartmanl  of  Computor  Sdonce 
Cornogio-Mollon  LMvoraity 
PittaburgK  15213 
Soptonobor  S,  1977 

INTIWDUCTION 

The  primary  objective  of  our  reMtrch  effort  is  to 
develop  techniques  and  syatema  which  would  lead  to 
successful  domonstratlon  of  imaga  understanding  concepts 
over  a wide  variety  of  tasks,  using  all  the  available  sources 
of  knowledge.  This  requires  the  determination  of  the  type 
and  nature  of  knowledge  that  might  be  applicable  in  a given 
task  situation.  The  representation,  use,  and  evaluation  of 
such  knowledge  must  be  made  within  a total  system’s 
context.  The  research  program  at  CMU  is  an  attempt  at 
parallel  development  of  various  components,  incrementally 
leading  to  irKreasingly  complex  Image  urKferstandIng 
systems. 

SYSTEMS  AMO  TASKS 

The  image  understanding  research  at  CMU  uses  a DEC 
System  10/80,  C.mmp  (a  16  processor  multi-mini  computer 
system),  and  a dedicated  MIPS  (Multi-sensor  Image 
Processing  System)  computer  (see  Figure  1). 

Our  present  plans  are  to  attempt  to  interpret 
uncontrived  arbitrary  images  representing  different  views 
of  the  downtown  Pittsburgh  area  (a  3-0  world),  and  serial 
and  satellite  views  of  the  Washington,  O.C.  area  (a  2-0 
world).  The  world  models  for  these  tasks  are  expected  to 
be  generated  incrementally  over  the  next  few  years. 

KNOWLEDGE  REPRESENTATION  AND  SEARCH 

During  the  last  Workshop  we  presented  our  views 
about  representation  of  krKtwIadge  and  search  (Rubin  and 
Reddy,  1977).  The  PPE  graph  structure  reptesenlation  of 
Knowledge  tends  to  be  expensive  in  terms  of  space 
required,  but  is  essential  if  we  wish  to  use  the  faster  beam- 
search  techniques  tor  image  interpretation.  We  expect  to 
embed  this  particular  krcowladge  representation  and  search 
as  the  principal  component  into  a total  system  which  will 
involve  planning  (solution  In  simpler,  coarser,  or  abstract 
spaces),  iterative  dynamic  refinement  of  Knowledge 
representation,  and  goal-directed  interpretation  strategies. 

At  present  we  are  developing  the  following 
krKfwIedge  sources  for  the  downtown  Pittsburgh  task:  a 3- 
D model  of  the  downtown  Pittsburgh  area,  knowldge  about 
building  structures  and  taxtures,  knowledge  about  local 
refinements  given  coarse  recognition  (a.g.,  detecting  cars  in 
roads  and  trees  and  bushes  next  to  roads),  knowledge  about 
shadows  occlusions  and  highlights,  and  so  on.  (jiven  our 
basic  approach  of  itarativa  refinement  of  knowledge,  we  will 
start  with  simple  versions  of  these  knowledge  sources,  and 
refirte  them  as  we  observe  their  limitations  when  applied  to 
different  scenes. 

Simee  the  last  Workshop  our  work  has  continued  on 
the  ARQOS  Image  Understanding  System  (Rubm  and  Reddy, 
1977).  Two  techniques  have  been  developer!  for  pruning 
the  network  as  the  image  is  labeled.  These  techniques 
enable  ARQOS  to  work  with  vary  large  networks  while 


maintaining  low  time  and  space  usage.  The  first  and  most 
powerful  pruning  method  is  the  implementation  of  *best 
lists*.  Best  lists,  which  are  used  in  the  HARPY  speech 
understanding  system,  are  lists  of  optimal  nodes  at  each 
stop  in  the  network  path.  In  effect,  best  lists  define  the 
*beam*  of  the  search.  By  limiting  the  size  of  the  best  lists 
to  20,  the  system  is  able  to  save  both  search  lime  and  state 
storage.  The  second  pruning  technique  that  has  been 
implemented  Is  quality  thresholding.  By  restricting  the  bast 
list  to  those  nodes  whose  likelihoods  are  above  a given 
threshold,  search  time  is  reduced  without  noticeable  loss  of 
labeling  accuracy. 

ARQOS  has  also  benefited  from  the  addition  of  a new 
texture  operator  called  contrast  which  is  derived  from  the 
4th  moment.  The  low-ieval  system  now  consists  of  this 
heavy-texture  operator  and  a low-textura  operator  which  is 
derived  from  the  mediaa  Each  of  these  operators  Is  applied 
to  the  red,  green,  and  blue  bands,  yielding  a feature  vector 
with  6 compormnts. 

It  is  expected  that  ARQOS  can  start  working  with  very 
large  networks  within  the  nes'  future.  These  networks  will 
enable  the  system  to  perform  scene  identification  and 
orientation  on  arbitrary  images  of  downtown  Pittsburgh, 
which  is  the  current  micro-world. 

IMAGE  FEATURE  ANALYSIS  AND  SEGMENTATION 

In  the  area  of  low  level  vision  recent  work  has  dealt 
with  problems  that  occur  whan  rad,  green,  and  blue  input 
data  from  natural  scenes  ara  transformed  into  the 
approximately  psychological  coordinates  of  normalized  color, 
hue,  and  saturation  (Kertder,  1977).  Results  indicate  that 
linear  transformations  (as  in  the  television  ir^ustry's  Y,  I, 
and  Q)  or  near -linear  transformations  (as  in  the  Haring 
theory  of  color  perception)  are  more  satisfactory 
alternatives  in  the  digital  environment. 

Work  also  has  been  done  in  comparing  various  texture 
operators’  relative  performance  on  natural  scenes.  One 
tentative  result  is  that  the  use  of  microedges  per  unit  area 
as  a approximation  to  amount  of  textura  (’’busyness*)  can  be 
simplified  (and  better  justified)  by  a somewhat  different,  but 
faster  algorithm.  The  microedga/srea  operator  is  the  result 
of  the  composition  of  several  other  operators:  an  edge 
detection,  followed  by  a threshold,  followed  by  an  average, 
followed  by  another  threshold.  The  choice  ot  the  last 
threshold  is  often  difficult,  as  the  process  yields  an 
exportential-looking  distribution.  By  using  a modified  edge 
detector  that  monotonically  emphasizes  high  strength  edges, 
the  resulting  exaggerated  values  (when  similarly  averaged 
over  the  same  unit  area)  empirically  ara  fouiid  to  have 
'nicer*  distributions.  That  is,  thay  seam  to  axhibit  naturally 
occuring  minima;  thrasholding  at  these  points  has  yielded 
results  which  subjectively  seem  equivalent  to  the 
microedge/area  procedure.  Work  is  continuing  on  this  and 
other  texture  transforms  and  measures. 

CHANGE  DETECTION 

We  plan  to  continue  experiments  in  symbolic 
registration  and  change  detection.  As  changes  due  to 
perspective  and  scale  become  more  and  more  dominant,  it 
becomes  desirable  to  view  the  problem  of  registration  as 
one  of  search  involving  constraint  satisfaction  based  on 
spatial  relationships.  Wa  think  Die  modal  presented  in  Rubin 
and  Reddy  (1977)  would  also  ba  useful  in  this  case.  The 
annual  progress  report  by  CDC  (available  at  this  workshop) 
describes  the  progress  to  date  on  tha  cooperstiva  image 
registration  rasearcti. 
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IMAGE  DATABASE 

If  w*  era  to  have  adaquata  parformarce  and  arror 
analyaia  tools  and  tools  for  Knowlsdga  sourca  generation,  It 
Is  desirable  to  manually  (or  Intaractivaly)  generate  symbolic 
descriptions  of  the  Images  to  be  analyzed.  This  and  other 
considerations  have  led  us  to  begin  to  develop  a unified 
symbolic  and  signal  Image  database  system.  The  structure 
of  this  database  is  described  in  McKaown  and  Reddy 
(1977a).  We  have  concantratad  our  afforts  on  the 
generation  of  hand-segmented  and  labeled  scenes  of 
downtown  Pittsburgh.  We  have  twenty  such  pictures  and 
they  are  currently  the  worKing  set  for  the  ARGOS  system. 
A major  application  of  the  MIDAS  sensor  database  to  the 
area  of  performance  evaluation  of  image  understanding 
systems  is  described  in  McKaown  and  Reddy  (1977b  in  this 
workshop). 

ARCHITECTURES  FOR  IMAGE  PROCESSING 

It  is  estimated  that  we  will  need  processing  power  of 
the  order  of  1 to  10  billion  Instructions  per  second  in  digital 
image  processing  systems,  requiring  rapid  response  times. 
We  are  attempting  to  develop  (in  cooperation  with  CDC)  now 
problem-oriented  high  speed  digital  processor  architectures 
for  image  processing.  Given  that  C.mmp  and  MIPS  are 
closely  coupled  multiprocessing  systems,  we  are  exploring 
issues  of  algorithm  decomposition  and  parallel-pipeline 
system  structures  for  image  processing.  Another  aspect 
under  study  is  the  development  of  a special  instruction  set 
for  image  processing  using  the  writable  control  store 
available  with  the  POP-11  processors  on  C.mmp  and  MIPS. 


MIPS 

HARDWARE  ORGANIZATION 


Figure  1 shows  the  hardware  organization  for  our 
Multi-sensor  Image  Processing  System  (MIPS).  The  basic 
structure  is  organized  around  a 4X4  crosspoint  memory 
switch,  a prototype  (or  the  16x16  switch  used  on  C.mmp 


(Wulf  and  Bell,  1972).  Each  of  tha  four  momories  may  bo 
accessed  through  any  of  four  ports  on  an  arbitrated  real- 
time basis.  The  bandwidth  lor  each  port  Is  about  30 
megabIts/sec.  This  design  allows  concurrent  use  of  memory 
by  four  processors.  Currently  POP  11/40E  processors 
(Fuller  et  al.,  1976)  equipped  with  writeable  control  store 
are  connected  to  two  of  the  ports.  Tha  writeable 
microstore  permits  the  implementation  of  special  image 
processing  instructions.  The  remaining  ports  are  dedicated 
to  a high  bandwidth  (about  50  megabits/sac)  raster  scan 
color  display  system.  This  system  displays  up  to  a maximum 
of  657  pixels  by  488  lines  (NTSC  standard  color  video)  with 
programmable  intensity  resolution  between  1 and  8 bits  per 
pixel.  A color  map  allows  an  8 bit  color  code  to  be  mapped 
into  3 tan  bit  color  fields  for  pseudo-color  generation, 
gamma  correction,  and  data  comprassioa  Paripherals  such 
as  a 9 track  magnetic  tape  driva,  Gould  printar  and  graphics 
display  processor  are  availabla  as  wall  as  three  large  disk 
structures  totaling  600  megabytas  of  online  storage. 

There  is  1 megsbyte  of  memory  which  is  organized 
into  four  partitions  accessabie  through  four  ports  controlled 
by  a fast  crosspoint  switch.  The  switch  allows  concurrent 
processing  by  display  hardwara  atKl  tha  multi-processors. 
For  example,  this  organization  allows  an  Image  frame  buffer 
to  be  displayed  from  memory  while  processors  change 
pixels  In  the  frame.  The  memory  size  was  chosen  so  that 
two  images  can  be  held  in  core  simultaneously.  This 
eliminates  possible  waiting  for  data  by  the  processors,  since 
bar>dwidth  considerations  between  disk  and  memory 
preclude  fast  massive  transfer  of  data.  Three  200 
megabyte  disk  storage  structures  provide  space  for  a large 
online  database  for  image  analysis  and  intarpretation.  The 
disk  structures  are  connected  to  memory  by  two 
independent  channel  controllers  which  permit  a transfer  rate 
of  16  megablts/sec. 

Much  of  our  time  has  been  spent  since  the  last 
Workshop  bringing  up  tha  UNIX  operating  system  on  the 
MIPS  machine.  Currently  a single  processor  system  with  full 
memory  and  400  megabytes  of  online  storage  is  operational. 
Communication  links  are  available  to  a front  end  terminal 
concentrator  allowing  users  to  communicate  from  a variety 
of  locations  and  terminals.  A picture  processing  package 
(PICPAC)  is  being  implemented  in  tha  C language  and  will 
provide  uniform  picture  access  ar>d  display  functions. 

KNOWLEDGE  ACQUISITION 

(jiven  the  paucity  of  ideas  about  type  and  nature  of 
knowledge  used  in  visual  percaption,  wa  are  continuing  our 
protocol  analysis  studies  in  human  visual  perception. 
Studies  in  progress  include  picture  puzzles  (Akin  and  Reddy, 
1977),  perception  as  a function  of  distance,  perception  In 
the  presence  of  contradiction,  and  peep-hole  perception 
studios. 

A major  dimension  used  in  processing  information 
selectively  is  stimulus  resolution.  The  overall  structure  and 
the  fine  detail  found  in  the  visual  environment  can  be  used 
separately  or  together  to  understand  its  different  aspects. 
In  order  to  understand  the  use  of  detail  in  visual  perception 
an  experimental  paradigm  has  been  devised.  The  subjects 
are  required  to  examine  the  different  size  projections  of 
slides  of  natural  scores.  They  verbally  report  what  they 
sea  during  the  experiments.  The  projected  images  are 
looked  at  from  55  feat  with  six  image  sizes  ranging  from  3- 
1/2"  by  5"  to  19"  by  27". 

Initial  results  Indicate  that  tha  lack  of  detailad 
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information  in  the  smaller  sizes  distorted  the  scenes  beyond 
recognition  (AKin  et.  al.,  1977).  The  minimum  size  at  which 
each  scene  was  correctly  perceived  correlated  with  the 
"commonplace'-ness  of  the  overall  structure  of  each  scene. 
Scenes  with  the  sKy-buildings-river,  person>with-objects- 
in-the-background  structure  fit  well  the  imaje  structures 
ordinarily  expected  by  the  subjects.  On  the  other  hand,  in 
scenes  where  the  camera  was  positioned  overhead,  the 
structure  of  the  scenes  were  not  recognized  in  any  of  the 
lower  three  levels.  At  the  lowest  levels,  scene  properties 
such  as  edges,  textures,  and  color  which  were  below 
threshold  values  of  resolution  with  respect  to  the  size  of 
the  receptors  in  the  retina,  ware  perceived  erroneously. 

CONCLUSION 

The  representation,  use,  and  evaluation  of  knowledge 
within  an  image  understanding  system  requires  parallel 
development  of  various  compomamts.  The  research  program 
at  CMU  represents  an  attempt  to  study  several  different 
facets  of  the  image  understandirtg  problem  in  a specific 
problem  context,  i.a.  the  3-0  Downtown  Pittsburgh  task. 
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ABSTRACT 

This  report  summarizes  progress  made 
during  the  period  March-September , 1977  in 
the  research  being  conducted  under  Contract 
DAAG53-76C-0138  (DARPA  Order  3206) . This 
project  is  devoted  to  the  development  of 
algorithms  for  automatic  target  cueing  in 
FLIR  imagery,  and  to  the  design  of  CCD  cir- 
cuits that  implement  these  algorithms.  It 
is  a joint  effort  of  the  Computer  Vision 
Laboratory  at  the  University  of  Maryland 
(Principal  Investigators:  Profs.  David  L. 
Milgram  and  Azriel  Rosenfeld)  and  the 
Westinghouse  Defense  and  Electronics  Sys- 
tems Center,  Systems  Development  Division, 
Baltimore,  MD  (Program  Director:  Dr. 

Glenn  E.  Tisdale) . The  project  is  being 
monitored  by  Mr.  John  Dehne  and  Dr.  George 
Jones  of  the  U.  S.  Army  Night  Vision  Lab- 
oratory, Ft.  Belvoir,  VA. 


INTRODUCTION 

The  project  reviewed  in  this  status 
report  was  initiated  on  May  1,  1976. 
Accomplishments  during  the  first  ten 
months  were  summarized  in  the  previous 
status  report  (as  of  March,  1977) , which 
was  included  in  the  Proceedings  of  the 
April  1977  Image  Understanding  Workshop 
[1] . Further  information  on  this  work  can 
be  found  in  the  Second  Semi-Annual  Report 
on  the  project  [2] , covering  the  period  1 
November  1976  - 30  April  1977. 

IMAGE  MODELING 

Two  studies  have  been  conducted  to 
analyze  the  outputs  of  edge  detection  and 
thresholding  operators  applied  to  an  image 
region.  These  studies  are  of  interest  in 
connection  with  estimating  the  false  alarm 
rates  associated  with  image  segmentation. 

In  the  first  study  [3J , the  input 
image  is  treated  as  a stationary  random 
field  from  a context- independent  ensemble. 

A statistical  analysis  of  the  responses  of 
various  edge  detection  operators,  including 
gradients  and  the  Laplacian,  to  such  an 
image  has  been  conducted.  Various  stochas- 
tic properties  of  the  outputs  were  predict- 
ed, and  the  results  compared  with  outputs 


obtained  from  real  and  synthetic  images. 

The  second  study  (4)  investigated 
the  results  of  thresholding  a noise  image, 
modelled  as  a two-dimensional  random  pro- 
cess completely  characterized  by  its 
mean  and  power  spectrum.  A statistical 
analysis  of  the  thresholded  output  has 
been  carried  out,  and  various  properties 
of  this  output  have  been  derived.  Com- 
parisons have  been  made  with  the  results 
of  thresholding  noise  images  that  have 
been  smoothed  to  varying  degrees. 

OBJECT  EXTRACTION 

Several  different  approaches  to  ex- 
tracting objects  from  a FLIR  image  have 
been  investigated.  One  class  of 
approaches  is  based  on  thresholding  the 
image  using  thresholds  derived  from  (gray 
level,  edge  strength)  scatter  diagrams. 
These  thresholds  are  usually  quite  good 
for  extracting  single  objects,  but  a more 
refined  approach  is  needed  to  handle 
multiple  objects.  One  such  approach 
makes  use  of  scatter  diagr^uns  based  on 
the  local  maxima  of  edge  strength,  rather 
than  on  all  the  edge  strength  values;  it 
is  described  in  greater  detail  in  [5]  . 

A second  general  approach  to  object 
extraction  is  based  on  detecting  coinci- 
dences between  the  borders  of  above- 
threshold connected  co'iponents  and  the 
points  of  high  edge  value.  An  implementa- 
tion of  this  approach,  called  "Super- 
slice" , has  yielded  good  segmentations  of 
FLIR  windows.  Several  improvements  have 
also  been  investigated,  e.g.,  a technique 
which  takes  into  account  how  well  the 
high-edge-value  points  surround  the  con- 
nected region.  Further  details  on  this 
technique  can  be  found  in  [5]  . 

The  Superslice  technique  can  also  be 
usr'd  as  part  of  an  iterative  threshold 
selection  scheme.  Specifically,  one  can 
hlstogr2un  the  given  image;  pick  a 
threshold;  use  Supers lice  to  extract  com- 
ponents whose  borders  have  high  edge 
values;  discard  these  components,  rehlsto- 
gram,  and  repeat  the  process.  Experiments 
with  this  approach  are  also  descrit>ed  in 
(51  . 

Studies  of  optimal  edge  detection 
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techniques  have  also  been  conducted.  In 
particular,  the  analysis  underlying  the 
Hueckel  edge  detection  scheme  has  been  re- 
excimined,  and  some  modifications  in  this 
scheme  have  been  made.  A class  of  oper- 
ators analogous  to  the  Hueckel  operator  has 
also  been  defined.  This  work  will  be  de- 
scribed in  two  forthcoming  technical  re- 
ports. 

REGION  ANALYSIS  AND  TRACKING 

The  algorithms  for  analyzing  the  con- 
nected component  structure  of  extracted 
image  segments  have  been  closely  examined 
in  order  to  clarify  various  problems 
associated  with  their  CCD  implementation. 

In  particular,  a one-pass  algorithm  has 
been  developed  that  constructs  a tree 
structure  for  a thresholded  image  in  which 
nodes  correspond  to  connected  components 
of  object  or  background  and  in  which  the 
parent  relation  is  based  on  region  en- 
closure. In  addition,  the  algorithm  labels 
each  region,  computes  a set  of  features  for 
it,  and  computes  the  chain  code  of  its 
outer  boundary.  The  details  of  this  al- 
gorithm can  be  found  in  [6] . 

The  objects  contained  in  a sequence 
of  images  can  be  tracked  from  frame  to 
frame  by  defining  a comparison  function 
that  evaluates  differences  between  descrip- 
tions of  object  regions.  One  can  then 
apply  dynamic  progrcimming  to  discover  the 
most  temporally  consistent  region.  This 
region  can  then  be  removed  from  all  frames, 
and  the  process  can  be  repeated.  This 
approach  has  been  successfully  applied  to 
the  small  sequence  data  base  that  is 
currently  available.  The  algorithm  and 
test  results  are  described  in  [7] . 
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CIRCUIT  DESIGN  AND  IMPLEMENTATION 

Westinghouse  has  continued  to  study 
the  CCD  focal  plane  implementation  of  the 
algorithms  developed  on  the  project.  In 
particular,  much  effort  has  been  devoted 
to  designing  implementations  of  the  connec' 
ted  component  labeling  process,  as  describ 
ed  in  [8] . 

Westinghouse  has  also  implemented  a 
circuit  that  functions  as  a sorter.  This 
circuit  can  be  used  as  a basis  for  imple- 
menting histogramming  and  median  filtering 
operations.  A description  of  it  can  be 
found  in  [9]  . 
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Raf  Uterlnf  Iina|M 

Image  understanding  must  start  with  the  Image.  We  have 
therefore  devoted  considerable  attention  to  understanding  the 
Image  formation  process  and  to  exploiting  the  constraints 
Involved.  Image  registration,  In  particular,  has  been  a focus, 
since  putting  a new  Image  Into  correspondence  with  map 
coordinates  Is  a ceruin  prerequisite  to  further  processing. 

Image  registration  can  be  approached  In  a number  of 
ways.  One  method  requires  the  diKOvery  and  use  of  very 
prominent  features.  Another  Involves  correlation  of  the  Image 
against  a synthetic  Image  made  from  a digital  terrain  model 
using  the  reflectance  map  to  determine  the  correct  Intensities 
from  combinations  of  surface  slope,  surface  materlaL  and  sun 
position. 

Horn  has  shown  that  the  correlation  method  produces 
outstanding  resuks,  potentially  yielding  registration  accuracy  in 
the  subplxtl  range.  Worklni  with  his  student,  Brett  Bachman, 
he  devised  a method  for  coping  with  the  computational 
problems  that  at  first  teem  to  make  correlation  unacceptably 
slow.  Basically  the  method  involvet  appproximating  registration 
first  with  low  retokitlon  real  and  synthetic  Images  before  going 
on  to  high  rcsohitlon  and  high  accuracy.  At  every  stage,  this 
keeps  the  hlll-cllmbing  space  reasonably  free  of  distracting  local 
maxima.  Detalb  can  be  found  m Horn's  paper  elsewhere  in  this 
collactlon. 

Synthetic  Imaget 

The  synthetic  Images  generated  as  part  of  the  registration 
processes  have  many  other  uses.  Making  shaded  relief  mapi  Is 
one  appikatlon,  of  course,  and  we  have  made  a number  from 
both  high-altitude  and  low-altitude  polnu  of  view.  Horn's 


paper.  Just  mentioned,  hat  umplet  of  the  high-akitude  variety. 

Making  the  low-altitude  Images  Involvet  tome 
complications  because  twitching  to  an  oblique  viewing  angle 
Introduces  a perspective  and  an  Interpobtlsn  problem.  Another 
student,  Tom  Strat,  hat  worked  on  this  problem  with  Horn, 
testing  a variety  of  algorithms  for  speed  and  quality  of  the 
resulting  product.  Work  continues  to  improve  the  resolution 
that  can  be  achieved  with  reasonable  computing  time. 

Multiple  Sun  Maps 

Horn  hat  Just  begun  another  map-making  project  that  Involvet 
the  use  of  two  or  more  Imaginary  tuns  distributed  at  key  points 
around  the  sky.  Such  maps  give  a faster,  better  appreciation  of 
the  terrain  than  ordinary  shaded  maps.  More  Intuitively,  It  It 
clear  that  north-facing  tiopet  will  be  bhw  if  the  Imaginary  sun 
to  the  north  Is  the  blue  one. 

Formally,  the  reason  It  that  the  Inlentity  of  a point  in  an 
ordinary  black-and-white  image  it  not  sufficient  to  determine 
the  surface  normal  of  the  surface  corresponding  to  that  point. 
There  It  constraint,  however,  and  one  black-and-white  Intensity 
does  limit  the  normal  to  lie  on  a definite  curve  in  the  reflectance 
map.  Using  multiple  tuns,  each  of  a different  color  In  a 
different  part  of  the  sky,  two  or  more  separate  curvet  In 
reflectance-map  space  are  obtained.  Their  intersection  gives  the 
surface  normal  unambiguously. 

Soon  we  hope  to  combine  two  slope-indicating  colors  with 
-a  third  indicating  altitude  In  order  to  pack  In  still  more 
Information. 

The  Albedo  Map 

Once  registration  it  under  controL  several  relaied  things  can  be 
done  starting  with  the  same  reflectance-map  based  technology. 
Change  detection  and  ground  cover  analytlt  are  two  things  to 
which  we  win  be  devoting  considerable  attention.  We  believe 
that  the  ratio  of  real  Image  mtensHy  to  synthetic  Image  Intensity 
wilt  be  a good  Index  to  ground  cover.  This  ratio  does  not 
depend  much  on  sun  position,  unlike  other  measures  used  up  to 
now.  We  can  an  Image  made  up  of  these  ratloe  an  albedo  map. 
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Two  'hwe'.  were  completed  during  the  summer  hiving  to  do 
with  motion  understanding.  The  first,  by  Shimon  Ullmin, 
produced  a number  of  results:  the  first  Involved  the 
demcnstraiion  that  five  poinu  on  a rigid  object  seen  in  three 
views  are  enough  to  determine  the  relative  three-dimensional 
position  of  the  five  points  in  each  of  the  views:  the  second 
involved  facts  having  to  do  with  matching  features  In  one  view 
to  those  in  another  view  made  a short  time  later.  Importantly,  a 
family  of  caperlmenis  demonstrated  that  the  feature 
correspondence  Is  carried  out  before  features  are  grouped 
together  Into  objects.  This  Is  a.v  existence  proof  that  a machine 
can  get  at  distance  Information  using  motton  without  first 
forming  objecu  and  without  the  obstacles  that  this  Involves. 

The  second  thesis  on  motion  vision,  by  Marit  f.av'n. 
Investigated  the  problem  of  low-level  navigation  In  hilly  terrain. 
Lavin's  input  consisted  of  line  drawings  of  hills  taken  by  a 
moving  vehicle  whose  position,  speed,  and  orientation  change. 
The  output,  as  shown  In  figure  I Is  a map  showing  the  location 
of  the  hilts  observed  and  the  position  of  the  vehicle  at  each 
point  a snapshot  Is  taken.  To  do  this,  Lavin  combined  Ullman's 
mathematical  results  with  his  own  hlll-matching  program 
Extensive  error  analysis  led  to  accurate  algorithms. 


Shape  From  Shading 

Bob  Woodham  completed  a thesis  that  was  dirKted  at  the 
theoretical  problem  of  getting  surface-orientatian  Information 
from  Intensity  and  at  the  praokal  problem  of  anaiyting  metal 
castings  for  defects  On  the  theoretical  side,  he  was  able  to 
exhibit  a range-conitriction  algorithm  lhal  does  a shape  analysis 
without  the  iniegraiioni  required  by  Horn's  previous  methods 
On  the  appllcatton  side,  he  applied  Mart's  primal  sketch 
operators  to  the  problem  of  determining  casting  gram  slu  and 


his  programs  equaled  human  performance  on  flat  surfaces.  As 
with  understanding  aerial  photographs,  part  of  the  problem  Is 
registering  real  Images  with  models.  Figure  2 shows  a synthetic 
Image  of  a cast  shuttlecock  made  In  preparation  for  such 
registration.  Note  the  specular  highlight 


The  Informatior  in  Marr's  primal  sketch  seems  to  have  a major 
role  In  determinlne  texture.  This  u Imoortant  because  texture 
gradients  help  build  the  2 1/2  D sketch  and  because  claulfying 
textures  and  discriminating  among  them  Is  Important  by  itself 

Bruce  Schatz  has  finished  a thesis  that  argues  that  texture 
Is  determined  by  first  order  statistics  on  a subset  of  the  primal 
sketch  descriptors.  It  seems  that  only  ungrouped  edge  fragments 
and  virtual  lines  connecting  neighboring  edge  points  are  needed. 

Moreover,  It  seems  that  the  analysis  of  these 
texture-determining  descriptors  can  be  quite  coarse  Mike  Riley 
has  shown  that  histograms  of  line  orienuiion  with  only  five  or 
six  buckets  seem  sufficient  for  handling  the  line-orlentation  part 
of  texture  discrimination  Figure  i shows  one  of  Riley's 
experimenial  images  The  figure  consists  mainly  of  a large 
square  in  which  the  line  segments  are  ai  random  orientations 
Inside  It.  there  is  a subsquare  of  about  one  fourth  the  size  in 
which  the  line  segments  have  only  three  orientations.  Curiously, 
the  subsquare  is  not  readily  discernable,  thus  arguing  for  the 
hypothesis  that  the  texture  discrimination  machinery  In  humans 
Is  not  very  refined 
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This  u encouraging  since  it  luggesu  machines  can  do  seen 
Without  exorbiunt  computation.  Related  experiments  seem  to 
show  that  intensltii  histograms  can  be  quite  coarse  as  well, 
although  work  In  thU  direction  Is  prellmlnai7. 

Rep-  'enlatlon 

We  are  commuted  to  the  Idea  that  good  representation  Is  the  key 
to  successful  image  understanding.  The  reflectance  >~riap  end  the 
albedo  map  are  examples  of  good  representations  oriented 
toward  understanding  and  exploiting  shading.  The  primal 
sketch,  the  2 1/2  D sketch,  and  the  generalized  cones  invented  by 
Binford  at  Stanford  University  are  examples  oriented  toward 
recognition. 

Each  of  ihete  representations  was  devised  to  make  some 
particular  kind  of  information  explicit.  Each  in  turn  helps  to 
define  the  computational  problems  that  eventually  lead  to 
working  algorithms. 

With  the  basic  work  on  the  primal  sketch  complete, 
attention  has  been  turned  to  the  2 l'2  D sketch  and  to  the 
question  of  making  it  from  Information  in  the  primal  sketch. 
Our  current  design  calls  for  representation  of  depth  and  surface 
orientation  as  well  as  contours  of  discontinuity  in  these 
quantities.  Additional  internal  computational  structure  will 
maintain  consistency  between  them  all. 

At  the  generalized -cone  level  of  representation,  Marr  and 
Keith  Ntshihara  continue  to  work  out  criteria  for  applicability 
and  means  by  which  hierarchies  of  descriptions  can  be 
assembled  together  using  body-centered  coordinate  systems. 

One  seemingly  important  theorem  discovered  by  Marr 
states  the  following:  given  some  simple  continuity  assumptions. 
If  the  points  on  a surface  that  correspond  to  the  observed 
boundary  of  that  surface  all  lie  in  a plane,  then  the  surface  Is 
part  of  a generalized  cone. 
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